(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
19 September 2002 (19.09.2002) 




PCT 



I I II 



(10) International Publication Number 

WO 02/073193 Al 



/ 



(51) International Patent Classification 7 : G0IN 33/48 

(21) International Application Number: PCTAJSO2/O905 1 

(22) International Filing Date: 12 March 2002 (12.03.2002) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

60/275,144 



12 March 2001 (12.03.2001) US 



(71) Applicant (for all designated Stales except US): BOARD 
OF REGENTS, THE UNIVERSITY OF TEXAS SYS- 
TEM IUS/USJ; 201 W. 7th Street, Austin, TX 78701 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): FOX, Robert, O. 



|US/US|; 8 Quintana Drive, Galveston, TX 77554 (US). 
YANG, Huan-waog |CNAJS|; 17 Kearny Ave. 5B, Edi- 
son, NJ 08817 (US). 

(74) Agent: ACOSTA, Melissa, W.; Fulbrighl & Jaworski 
L.L.P., 1301 McKinney, Suite 5100, Houston, TX 
77010-3095 (US). 

(81) Designated States (national): All, AG, AL, AM, AT, AU, 
AX, BA, BB, BG, BR, BY, BZ, CA, CI I, CN, CO, CR, CU, 
C/, DM DK, DM, DZ, EC, EE, ES, II, GB, GD, GH, GH, 
GM, IIR, III), ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, IT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, OM, PI I, PL, PT, RO, RU, SD, SE, SG, 
SI, SK, SL. TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, 
VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 

[Continued on next page J 



(54) Title: COMPUTER-BASED STRATEGY FOR PEPTIDE AND PROrEIN CONFORMATIONAL ENSEMBLE ENUMER- 
ATION AND LIGAND AFFINITY ANALYSIS 



The peptide 

r<^chpq^gmvee<;r 




(57) Abstract: The present 

invention provides a method to 
generate and analyse ensembles 
of peptide and protein conform ers 
and predict the affinity of a given 
conformation of the peptide or 
protein for a target protein. 



BNSDOCID: <WO 02073193A1J_> 



BES T AVAILABLE COPY 



WO 02/073193 Al I1MMIHM11IM 



European patent (AT, BE, CM, CY, DE, DK, ES, 11, ER, 
GB, GR, IE, IT, LU. MC, NL, PT, SE, TO), OAPI patent 
(BE, BJ, CF, CXi, CI, CM, CiA, GN, GQ, GW, ML, MR, 
Nli, SN, TD, 1X5). 



Published: 

- with international search report 



— before the expiration of the time limit for amending the 
claims and to be republished in the event of receipt of 
amendments 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



BNSDOCID: <WO 020731 93A1J_> 



WO 02/073193 



PCT/US02/09051 



COMPUTER-BASED STRATEGY FOR PEPTIDE AND PROTEIN 
CONFORMATIONAL ENSEMBLE ENUMERATION AND LIGAND AFFINITY 

ANALYSIS 

BACKGROUND OF THE INVENTION 

[0001] This application claims priority to U.S. Provisional Application No. 

60/275,144, which was filed on March 12, 2001. 

[0002J The U.S. Government may have certain rights in the invention by virtue of a 
grant from DARP A. 

I. Field of the Invention 

[0003] The invention generally relates to the field of structural biology. It concerns a 
method of modeling the structure of a peptide and stabilizing the structure of that peptide by 
the insertion of an amino acid not naturally found in that position in the peptide. It also 
concerns a method for assessing the binding affinity of a peptide to a template molecule and a 
method for determining the rate of loop closure in a peptide via a disulfide bond 

II. Related Art 

[0004] The protein modeling approach of the present invention provides an efficient 
method of predicting where to insert cysteines or other amino acids in a peptide in order to 
stabilize the peptide. 

[0005] The three-dimensional structure of proteins has been determined in a number 
of ways. One of the most well known way of determining protein structure involves the use 
of the technique of x-ray crystallography. Using this technique, it is possible to elucidate the 
three-dimensional structure with good precision. Additionally, protein structure may be 
determined through the use of the techniques of neutron diffraction, or by nuclear magnetic 
resonance (NMR). 

[0006] The three-dimensional structure of many proteins may be characterized as 
having internal surfaces (directed away from the aqueous environment in which the protein is 
normally found) and external surfaces (which are exposed to the aqueous environment). 
Through the study of many natural proteins, researchers have discovered that hydrophobic 
residues (such as tryptophan, phenylalanine, leucine, isoleucine, valine, or methionine) are 
most frequently found on the internal surface of protein molecules. In contrast, hydrophilic 
residues (such as aspartate, asparagine, glutamate, glutamine, lysine, arginine, serine, and 
threonine) are most frequently found on the external protein surfaces. The amino acids 
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alanine, cysteine, glycine, histidine, proline, serine, tyrosine, and threonine are encountered 
-with more nearly equal frequency on both the internal and external protein surfaces. 

[0007J The biological properties of proteins depend directly on the protein's three- 
dimensional (3D) conformation. The conformation determines the activity of enzymes, the 
capacity and specificity of binding proteins, and the structural attributes of receptor 
molecules. Each protein has an astronomical number of possible conformations (about 1016 
for a small protein of 100 residues, and there has been no reliable method for picking the one 
conformation that predominates in aqueous solution. A second difficulty is that there are no 
accurate and reliable force laws for the interaction of one part of a protein with another part 
and with water. These and other factors have contributed to the enormous complexity of 
determining the most probable relative location of each residue in a known protein sequence. 

[0008] The protein folding problem, the problem of determining a protein's three- 
dimensional tertiary structure from the amino acid sequence, was first formulated more than 
half a century ago. Early observations and later experiments have lead to the contemporary 
view that protein conformation is determined solely by the amino acid sequence and that 
there exists a unique native conformation in which residues distant in sequence but proximate 
in space engender a close-packed core enriched in hydrophobic residues. As a result of the 
revolution in molecular biology, the number of known protein sequences is about 50 times 
greater than the number of known three-dimensional protein structures. This disparity hinders 
progress in many areas of biochemistry because a protein sequence has little meaning outside 
the context of the three-dimensional structure. 

[0009] The protein modeling approach of the present invention provides an efficient 
method of predicting where to insert cysteines or other amino acids in a peptide in order to 
stabilize the peptide. The present invention provides an ensemble-based all-atom melhod, 
mini-protein modeling program (MPMOD), to stabilize a protein to provide higher affinity 
binding. 

SUMMARY OF THE INVENTION 

[0010] Therefore, it is an objective of the present invention to provide a method to 

generate and analyze ensembles of peptide and protein conformers and predict the affinity of 
a given conformation of the peptide or protein for a target protein. 

[0011] An embodiment of the present invention is a computer-assisted method for use 
in modifying a protein comprising the steps of: inputting a peptide sequence and parameters 
for analysis into a computer-assisted modeling program; randomly generating <j>, \|/, co; 
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generating a peptide backbone; performing a van der Waals check of the backbone; 
calculating a solvent accessible surface based energy of all conformers; modeling the 
disulfide(s); performing a van der Waals check after each rotamer is added; and calculating a 
solvent accessible surface based energy of all conformers. In a further embodiment, the 
protein may comprise a peptide. In yet a further embodiment, the method may comprise 
performing a binding test for each conformer with a template molecule. In a further 
embodiment, the method may comprise calculating the rate of disulfide bond loop closure. 

[0012] Another embodiment of the invention is a method of protein miniaturization 
comprising modeling a protein to have the necessary active site conformation using the 
method above while reducing the total number of amino acids in the protein. 

[0013] Yet another embodiment is a method of increasing binding affinity between a 
protein and a template molecule by decreasing the conformational entropy loss upon binding 
by the protein comprising the constraint of at least one loop of an unstable region of the 
protein in conformational space using the method above. 

[0014] Another embodiment of the present invention is a computer-assisted method 
for use in modifying a protein comprising the steps of: inputting a peptide sequence into a 
computer-assisted modeling program; inputting parameters for analysis into a 
computer-assisted modeling program; generating <j>, \|/, co angles randomly in allowed region 
of Ramachandran maps; assigning angles to each residue of the backbone; generating 
backbone atoms for N, CA, C; generating the rest of the backbone atoms; performing van der 
Waals check for each atom; modeling disulfide bonds and recording the disulfide coordinate 
pairs; adding rotamers to residues; performing van der Waals check for each rotamer; 
performing binding test with a template protein; and calculating solvent accessible surface 
based energy for each conformer. In a further embodiment the protein may comprise a 
peptide. 

[0015] Another embodiment of the present invention is a computer-assisted method 
for use in modifying a protein comprising the steps of: inputting a peptide sequence into a 
computer-assisted modeling program; inputting parameters for analysis into a 
computer-assisted modeling program; generating <|>, vy, co angles randomly in allowed region 
of Ramachandran maps; assigning angles to each residue of the backbone; generating 
backbone atoms N, CA, C; generating the rest of the backbone atoms; performing van der 
Waals check with all other atom after each atom is added; adding rotamers to residues; 
checking distance pairs between atoms; modeling disulfide bonds and recording the disulfide 
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coordinate pairs; performing van der Waals check for the disulfide bonds with the complete 
conformer; recording number of conformers that are able to form disulfide bonds; and 
calculating solvent accessible surface based energy for each conformer. 

[0016] Yet another embodiment of the present invention is a computer-assisted 
method for use in modifying a protein comprising the steps of: inputting a peptide sequence 
and parameters for analysis into a computer-assisted modeling program; searching of 
conformational space in the allowed regions of the Ramachandran plots; minimizing the N 
and C termini of the conforaier to be the same as the high resolution structure; checking the 
handedness of the conformer; aligning the conformer to the high resolution structure; and 
performing a van der Waals calculation. 

[0017] Still another embodiment of the present invention is a computer-assisted 
method for use in modifying a protein comprising the steps of: inputting residues numbers of 
the flexible loop of a protein into a computer-assisted modeling program; inputting 
parameters for analysis into a computer-assisted modeling program; general \|/, a> angles 
randomly in allowed region of Ramachandran maps; generating backbone atoms; performing 
a CA-CA distance check for the N and C termini; minimizing the N and C termini of the 
conformer to be the same as the high resolution structure; checking the handedness of the 
conformer; performing van der Waals check on backbone atoms; aligning the conformer to 
the high resolution structure; performing van der Waals check on backbone and template 
protein atoms; adding sidechains to the backbone atoms; and performing van der Waals 
check on all atoms. 

[0018] An embodiment of the present invention is a method for determining the rate 
of disulfide bond loop closure in a protein comprising at least one two-cysteine motif 
represented by C-X„-C where n is an integer, the method comprising the steps of: performing 
a van der Waals calculation on a multiplicity of conformers of the peptide and subtracting 
those conformers that cannot form an intramolecular disulfide to yield an ensemble of N 0 
sterically allowed conformers; analyzing the ensemble of sterically allowed conformers to 
yield an ensemble of N c conformers that can potentially form an intramolecular disulfide 
bond; and calculating the ratio N C /N D which represents the rate of disulfide bond loop closure 
in the peptide. In another embodiment the rate may be compared to the rate of disulfide-bond 
loop closure of the peptide containing at least one different two-cysteine motif. In a further 
embodiment, the method may comprise the step of generating peptide backbone coordinates 
for the C-X n -C motif from standard bond angles, bond lengths and <|>, Y» <° dihedral angles 
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randomly obtained within the allowed regions of <|>, \\f Ramachandran map for each residue to 
yield the multiplicity of conformers of the peptide. In a further embodiment, the method may 
further comprise the step of using a side chain rotamer library to generate C-X n -C side chain 
coordinates to yield the multiplicity of conformers of the peptide. In another embodiment of 
the present invention, analyzing the sterically allowed conformers may comprise calculating 
the free energy of the conformers based upon the solvent accessible surface area. In yet 
another embodiment of the present invention, analyzing the sterically allowed conformers 
may further comprise flexibly modeling the cysteine side chains. In a further embodiment of 
the present invention, the method may comprise the step of weighting N c and N Q by the 
difference in free energy (AG) between the dithiol and disulfide forms of the C-X^C motif 
and calculating the ratio 



which represents the energy-weighted rate of disulfide loop closure in the peptide. In 
a further embodiment of the present invention, the method may comprise the step of 
identifying an ensemble of N c conformers of the protein that can potentially form an 
intramolecular disulfide bond. In another embodiment of the present invention, docking the 
ensemble of N c conformers to a binding site on a template biomolecule to yield an ensemble 
of aligned conformers; and performing a van der Waals calculation on the ensemble of 
aligned conformers to yield an ensemble of Nb sterically allowed conformers that bind to the 
template biomolecule. In a further embodiment docking the ensemble of N c conformers to a 
binding site on a template biomolecule may comprise the steps of: aligning the ensemble of 
N c conformers to a binding site on a template biomolecule to yield an ensemble of aligned 
conformers; and performing a van der Waals calculation on the ensemble of aligned 
conformers to yield an ensemble of Nb sterically allowed conformers that bind to the template 
biomolecule. In another embodiment of the invention, the peptide may further comprise a 
plurality of two-cysteine motifs represented by C-X n -C wherein n is independently an integer 
for each two-cysteine motif. 

[0019] Another embodiment of the present invention is a method for assessing the 
binding affinity of a protein to a template molecule, wherein the protein comprises at least 
one two-cysteine motif represented by C-Xn-C where n is an integer, the method comprising: 
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docking the ensemble of N c conformers to a binding site on a template biomolecule to yield 
an ensemble of N b conformers that bind the template biomolecule; and calculating the ratio 
Nb/Nc which is indicative of the binding affinity of the protein for the template biomolecule. 

[0020] Yet another embodiment of the present invention is a method for assessing the 
binding affinity of a protein to a template molecule, wherein the protein comprises at least 
one two-cysteine motif represented by C-Xn-C where n is an integer, the method comprising 
the steps of: screening a population of candidate peptides comprising at least one two- 
cysteine motif represented by C-X n -C where n is an integer to yield a plurality of candidate 
peptides that can potentially form an intramolecular disulfide bond; and performing the 
method of docking the ensemble of N b conformers to a binding site on a template 
biomolecule to yield an ensemble of N b conformers that bond die template biomolecule and 
calculating the ratio N b /N c which is indicative of the binding affinity of the protein for the 
template biomolecule on at least one candidate peptide that can potentially fomi an 
intramolecular disulfide bond to assess the binding affinity of the candidate peptide. In 
another embodiment of the invention each candidate peptide may comprise a pre-selected 
amino acid sequence. In yet another embodiment of the invention the pre-selected amino 
acid sequence may predispose the peptide to form a desired secondary structure. In another 
embodiment of the invention the desired secondary structure may be a P-turn. 

[0021] Another embodiment of the invention is a method for modifying a protein 
comprising the steps of: evaluating the X-ray crystal structure or a nuclear magnetic 
resonance solution structure comprising an oxidized reference peptide bound to a target 
molecule, the reference peptide comprising at least one intramolecular disulfide bond, to 
identify at least two amino acids at positions favorable to intramolecular disulfide bond 
formation; substituting cysteines for the two amino acids in the reference peptide to yield a 
modified peptide comprising at least four cysteines; identifying an ensemble of N c 
conformers of the modified peptide that can potentially form at least two intramolecular 
disulfide bonds; docking the ensemble of N c conformers to the binding site on the template 
biomolecule to yield an ensemble of N b conformers that bind the template bimolecular; 
calculating the ratio Nb/Nc which is indicative of the binding affinity of the modified peptide 
for the template biomolecule; and repeating steps (i.)-(v.) to yield modified peptides having 
cysteine substitutions at different positions so as to identify modified peptides with the 
highest Nb/Nc ratios. In another embodiment of the invention identifying an ensemble of N c 
conformers of the modified peptide that can potentially form at least two intramolecular 
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disulfide bonds may comprise the steps of: identifying a first conformer of the peptide that 
can potentially form a first intramolecular disulfide bond defining a first disulfide-bonded 
loop; constraining the model by the first disulfide bond; and identifying a second conformer 
of the peptide that can potentially form a second intramolecular disulfide bond defining a 
second longer disulfide-bonded loop. In another embodiment of the invention, if a second 
conformer is not identified after about 5 to about 10 attempts to identify said conformer, the 
method may further comprise the steps of: eliminating the first disulfide bond from the 
model; identifying a first conformer of the peptide that can potentially form a first 
intramolecular disulfide bond defining a different first disulfide-bonded loop; constraining 
the model by the first disulfide bond; and identifying a second conformer of the peptide that 
can potentially form a second intramolecular disulfide bond defining a second longer 
disulfide-bonded loop. 

[0022] Another embodiment of the present invention is a method for assessing the 
binding affinity of a protein to a template molecule, wherein the protein comprises a flexible 
loop, the method compromising the steps of: generating a peptide conformation of length N 
from a starting residue I and matching to a target residue I + N on the peptide model; 
accepting the loop conformation when the deviation between residue N and the target residue 
is small; closing the loop using a geometric minimization method; selecting the residue 
conformation by the method of performing a van der Waals calculation on a multiplicity of 
conformers of the peptide and subtracting those conformers that cannot form an 
intramolecular disulfide to yield an ensemble of N 0 stericaily allowed conformers; analyzing 
the ensemble of stericaily allowed conformers to yield an ensemble of N c conformers that can 
potentially form an intramolecular disulfide bond; and calculating the ratio Nc/N 0 which 
represents the rate of disulfide bond loop closure in the peptide and generating peptide 
backbone coordinates for the C-X n -C motif from standard bond angles, bond lengths and <|>, 
co dihedral angles randomly obtained within the allowed regions of a <)>, \|/ Ramachandran 
map for each residue to yield the multiplicity of conformers of the peptide; generating an 
ensemble of surface loops; and estimating the binding affinity by testing the docking of the 
full mini-protein ensemble and peptide target containing the loop ensemble. 

[0023] Another embodiment of the invention is a protein produced by a computer- 
assisted method for use in modifying a protein comprising the steps of: inputting a peptide 
sequence and parameters for analysis into a computer-assisted modeling program; randomly 
generating (|>, \y, oo; generating a peptide backbone; performing a van der Waals check of the 
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backbone; calculating a solvent accessible surface based energy of all conformers; modeling 
the disulfide(s); performing a van der Waals check after each rotamer is added; and 
calculating a solvent accessible surface based energy of all conformers. 

[0024] An embodiment of the present invention is a protein produced by protein 
miniaturization comprising modeling a protein to have the necessary active site conformation 
using the method of inputting a peptide sequence and parameters for analysis into a 
computer-assisted modeling program; randomly generating (j>, y> ®l generating a peptide 
backbone; performing a van der Waals check of the backbone; calculating a solvent 
accessible surface based energy of all conformers; modeling the disulfide(s); performing a 
van der Waals check after each rotamer is added; and calculating a solvent accessible surface 
based energy of all conformers while reducing the total number of amino acids in the protein. 

[0025] Another embodiment of the present invention is a protein capable of docking 
into a binding site wherein the conformation of a portion of said protein was constrained by 
the introduction of a disulfide bond by the method of inputting a peptide sequence and 
parameters for analysis into a computer-assisted modeling program; randomly generating <(), 
\\f and co; generating a peptide backbone; performing a van der Waals check of the backbone; 
calculating a solvent accessible surface based energy of all conformers; modeling the 
disulfide(s); performing a van der Waals check after each rotamer is added; and calculating a 
solvent accessible surface based energy of all conformers. 

[0026] An embodiment of the present invention is a protein, created by the method of 
inputting a peptide sequence and parameters for analysis into a computer-assisted modeling 
program; randomly generating <|>, i|/ and ©; generating a peptide backbone; performing a van 
der Waals check of the backbone; calculating a solvent accessible surface based energy of all 
conformers; modeling the disulfide(s); performing a van der Waals check after each rotamer 
is added; and calculating a solvent accessible surface based energy of all conformers, having 
the characteristic of inhibiting the binding of a virus to a cell wherein the protein is based 
upon a tertiary structure of a toxin and comprises at least one loop constrained by a disulfide. 
In still another embodiment, and ensemble of intramolecular disulfide bond-forming 
conformers of said loop from the protein may be produced by this method. 

[0027] Still another embodiment of the present invention is a protein having 
decreased conformational entropy loss upon binding to a template molecule in comparison to 
the naturally occurring protein due to the constraint of at least one loop of an unstable region 
of a protein in conformational space by the formation of a disulfide and other than disulfide 
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bonds found in the naturally occurring protein using the method of inputting a peptide 
sequence and parameters for analysis into a computer-assisted modeling program; randomly 
generating <}>, y and co; generating a peptide backbone; performing a van der Waals check of 
the backbone; calculating a solvent accessible surface based energy of all confonner; 
modeling the disulfide(s); performing a van der Waals check after each rotamer is added; and 
calculating a solvent accessible surface based energy of all conformers. In still another 
embodiment, and ensemble of intramolecular disulfide bond-forming conformers of said loop 
of the protein may be produced by this method. 

[0028] An embodiment of the present invention is a protein produced by a computer- 
assisted method for use is modifying a protein comprising the steps of: inputting a peptide 
sequence and parameters for analysis; searching conformational space in the allowed regions 
of the Ramachandran plots; minimizing the N and C termini of the conformer to be the same 
as the high resolution structure; checking the handedness of the conformer; aligning the 
confonner to the high resolution structure; and performing a van der Waals calculation. 

[0029] Another embodiment of the present invention is a protein modified by the 
method comprising the steps of: evaluating a X-ray crystal structure or a nuclear magnetic 
resonance solution structure comprising an oxidized reference peptide bound to a target 
molecule, the reference peptide comprising at least one intramolecular disulfide bond, to 
identify at least two amino acids at positions favorable to intramolecular disulfide bond 
formation; substituting cysteines for the two amino acids in the reference peptide to yield a 
modified peptide comprising at least four cysteines; identifying an ensemble of N c 
conformers of the modified peptide that can potentially form at least two intramolecular 
disulfide bonds; docking the ensemble of N c conformers to the binding site on the template 
biomolecule to yield an ensemble of Nb conformers that bind the template biomolecule; 
calculating the reaion Nb/N c which is indicative of the binding affinity of the modified 
peptide for the template biomolecule; and repeating these steps to yield modified peptides 
having cysteine substitutions at different positions so as to identify modified peptides with 
the highest Nb/N c ratios. 

[0030] A further aspect of the invention is a computer system that can implement the 
described methods. The computer system has a software program coded to perform the 
described methods. Preferably, a software program would read protein sequence data from a 
database or from an input file. One embodiment of such a computer system for designing a 
modified-protein includes a database containing a set of protein sequence data and a software 
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program coupled with said database for interaction with the database. The software program 
is adapted for performing the steps of generating randomly conformational angles from the 
set of protein sequence data, generating a protein backbone using the conformational angles, 
performing a van der Waals calculation of the protein backbone, calculating a solvent 
accessible surface based energy of conformers, modeling disulfide bonds in the protein 
backbone, performing a van der Waals calculation for the disulfide bonds, calculating a 
solvent accessible surface based energy of conformers that are generated in previous steps, 
and creating the modified protein with structural characteristics found in the above steps. 

[0031] Another embodiment of a computer system for designing a modified-protein 
has a database containing a set of protein sequence data and a software program coupled with 
said database. The software program is adapted for performing the steps of generating 
randomly conformational angles in allowed region of Ramachandran maps from the set of 
protein sequence data, generating a protein backbone using said conformational angles, 
determining disulfide bonds in the protein backbone, calculating linear conformers, 
calculating solvent accessible surface based energy of conformers that are generated in 
previous steps, and creating the modified protein using structural characteristics identified in 
the above steps. The calcluating step may be performed by the software or linked to an 
external program for calculating conformers. 

[0032] Another aspect of the invention is a computer-readable storage medium having 
stored therein a software program that is capable of executing the methods described herein. 
The computer-readable medium may be any storage-readable medium utilized by a computer, 
for purposes of illustration but not for limitation, may include floppy disks, hard drives, 
storage drives, disk packs, ROM, RAM, PC cards, optical media, and magnetic media. 

[0033] Other objects, features and advantages of the present invention will become 
apparent from the following detailed description. It should be understood, however, that the 
detailed description and the specific examples, while indicating preferred embodiments of the 
invention, are given by way of illustration only, since various changes and modifications 
within the spirit and scope of the invention will become apparent to those skilled in the art 
from this detailed description. 

BRIEF SUMMARY OF THE DRAWINGS 
[0034] The following drawings form part of the present specification and are included 

to further demonstrate certain aspects of the present invention. The invention may be better 
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understood by reference to one or more of these drawings in combination with the detailed 
description of specific embodiments presented herein. 

[0035] FIG. 1. In the diagram, the letters A, B and C are the starting positions of the 
three atoms. The letters p, q and v are the vectors whose lengths are the standard bond 
lengths. X0, a temporary position of X, is in the q direction. The distance between C and X0 
is same as the C-X bond length. 

[0036] FIG. 2. Two successive (residue i-1 and i) peptide units are selected from the 
polypeptide backbone. Rotation about the N-Ca bond is denoted by (j>, and rotation about Ca- 
C bond by \|/ and about C-N bond by co. 

[0037] FIG. 3A ? FIG. 3B, FIG. 3C and FIG. 3D. Assignment of § and ij/ angles. 
The less-favored regions, bounded by dashed lines, are given 30% point pairs less than the 
favored regions that are bounded by solid lines. The pictures are respectively FIG. 3A, for 
ALA, FIG. 3B, for VAL, FIG. 3C for GLY, FIG. 3D for PRO. 

[0038] FIG. 4. The diagram demonstrates the two cysteines (i and j) that are assigned 
for disulfide modeling. The positions of the sulfur SG are restricted on the circles that are 
obtained by the rotation along the Cct-CP bond. 

[0039] FIG. 5. The MPMOD flow chart. The input parameters (step 101), such as the 
peptide sequence and disulfide bond connectivity, are loaded, then the conformational angles 
(<j), v|/, <o) are generated in the four maps (step 102). The atoms of main chain and side chain 
are generated base on the angles. The van der Waals checks are performed separately for the 
backbone atoms and side chain atoms (step 103). If there is a Van der Waals violation, the 
conformer will be rejected. It will go back to get another set of conformational angles until 
the peptide is finished without any atom collisions. Then the coordinates of peptide are 
recorded and the solvent accessible surface (SAS) based energy is calculated (step 104). The 
disulfide bond is modeled to see if there is a disulfide bond is possible for the two residue 
pairs (step 105). If a disulfide bond is possible, the SAS energy for this conformer is 
calculated. If a disulfide bond is not possible, another set of conformational angles is tried 
and the procedure is repeated until a conformer with a disulfide bond is obtained. Finally, the 
SAS energy is calculated for this conformer again (step 106). 

[0040] FIG. 6A and FIG. 6B. Comparison of probabilities obtained from modeling 
with the equilibrium constant K<j obtained from experiment (Zhang & Snyder, 1989). The 
probabilities have been scaled to the experimental values. The scale factor for CXC and 
CXXC was defined by K==Z(Exp)/Z(Mod), where Z(Exp) is the sum of all the experimental 
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values and Z(Mod) is the sum of all modeled values. Each individual probability of the 
modeled conformer is multiplied by the scale factor K to get the scaled value. The dark bars 
are from experiment and the light ones are from calculation. FIG. 6A shows the values for 
CXC. FIG. 6B shows the values for CXXC. 

[0041] FIG. 7. The variation of the ratio N c /N c as the number of conformer increases 
in the ensemble. Here, only the CXXC series is given. The CXC series is also similar. 

[0042] FIG* 8. Comparison of probabilities obtained from modeling with the 
equilibrium constant Kc obtained from experiment (Zhang & Snyder, 1989). Series 1 is the 
ratio of number of hydrogen bonds in the SS bond closed conformers divided by the total 
number of conformer Series 2 is the ratio weighted by the state probability 



Series 3 is from the Kc from experiment. Series 4 is the ratio N C /N G from the hard sphere atom 
model. In order to put them into one picture, we scaled all the series in following factors: 
series 1 (*1000), series 1 (*10000), series 2 (*100), series 3 (*0.1), series 4 (*1000). 

[0043] FIG. 9. Interactions for the peptide-streptavidin complex. Here the peptide 
has two disulfide bonds that are cross-linked. The HPQ motif is sitting in the binding pocket 
and there are three hydrogen bonds involving in the interaction for the complex. 

[0044] FIG. 10. Disulfide-bonded random conformations for the ensemble 
CCHPQCGMVEEC. Each conformer has two cross-linked disulfide bonds. The randomly 
generated conformer has various conformations. 

[0045] FIG. 11. The number of chances for each residue of the peptide 
CCHPQCGMVEEC to collide with the target streptavidin. 

[0046] FIG. 12. Correlation of the "binding ratio" with the observed binding constant 
Ka. The straight line is fitted by minimizing the summation Res =Zi=i N (AG f - AG m ) 2 . 

[0047] FIG. 13A and FIG. 13B. Flow chart of the MPMOD program (Fast Mod). 

[0048] FIG. 14A and FIG. 14B. Flow chart of the MPMOD program (Slow Mod). 

[0049] FIG. 15. Flow chart of the MPMOD program (Loop Generation). 

[0050] FIG. 16. Flow chart of the MPMOD program's modeling of disulfide bonds. 

[0051] FIG. 17. Flow chart of the MPMOD program's binding test. 
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DETAILED DESCRIPTION OF THE INVENTION 
[0052] The present application includes methods of modifying a peptide to increase 

the binding affinity of a template molecule by increasing the stability of a peptide by 

decreasing the conformational entropy loss upon binding to the template molecule. 

I. Definitions 

[0053] A or an, as used herein in the specification, may mean one or more than one. 
As used herein in the claim(s), when used in conjunction with the word "comprising'!, the 
words "a" or "an" may mean one or more than one. 

[0054] Another, as used herein, may mean at least a second or more. 

[0055] Based upon a tertiary structure, as used herein, refers to a structure that 
possesses a similar backbone structure to that of the original structure that it is referred to 
being based upon. 

[0056] Conformer, as used herein, refers to various non-superimposable three- 
dimensional arrangements of atoms that are interconvertible without breaking covalent 
bonds. 

[0057] Constrained, as used herein, refers to a limitation in the conformational space 
that the peptide may adopt. 

[0058] Disulfide bridge and disulfide bond as used herein, refers to a covalent bond 
between the sulfur atoms of two cysteines. 

[0059] Generate, as used herein, refers the act of defining or originating by the use of 
one or more operations. The individuals using the invention may create the matter or data 
themselves or locate the matter or data elsewhere and utilize it in the practice of the 
invention. 

[0060] Loop, as used herein, are turns in the polypeptide chain that reverse the 
direction of the polypeptide chain at the surface of the molecule. 

[0061] Rotamer, as used herein, refers to a low energy amino acid side chain 
conformation. 

[0062] Peptide, as used herein, refers to a chain of amino acids with a defined 
sequence whose physical properties are those expected from the sum of its amino acid 
residues and there is no fixed three-dimensional structure. 

[0063] Protein, as used herein, refers to a chain of amino acid residues usually of 
defined sequence, length and three-dimensional structure. The polymerization reaction which 
produces a protein results in the loss of one molecule of water from each amino acid, proteins 
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are often said to be composed of amino acid residues. Natural protein molecules may contain 
as many as 20 different types of amino acid residues, each of which contains a distinctive 
side chain. A protein may be composed of multiple peptides. 

[0064] Structural Characteristics, as used herein, refers to the characteristics that 
are determined using the computer-assisted program, such as, but not limited to folding 
characteristics, disulfide bonding, binding affinity, aggregation, solubility, immunogenicity, 
stablility, etc. Thus, one of skill in the art realizes that the present invention is used to 
determine any structural characteristic of a protein and this characteristic may be enhanced or 
reduced depending upon the application of use. 

[0065] Template molecule, as used herein, refers to the protein to which the 
modified protein is binding. 

II. MPMOD 

[0066] Combination of the random search of the conformational space in the allowed 
regions of Ramachandran plot, using the simple hard sphere model to generate the stereo- 
chemically acceptable conformers, and a flexible disulfide bond modeling, provides a simple 
and useful tool to study the behavior of cyclic peptides. The "rate" or probability of SS bond 
loop closure as defined by Nc/N D converges to a stable value when the ensemble has more 
than 1000 conformer. For the CXC and CXXC series of peptide, the modeled probability of 
loop closure behaves the same way as the experimentally determined equilibrium constant 
for all the four types of the peptides. Both compare well after a common scale factor is 
applied. The geometry or van der Waals interaction plays a dominant role for the loop closure 
for the small peptides CXC and CXXC. One of skill in the art realizes that the MPMOD 
method of protein design is not limited to protein pharmaceuticals. For example, it includes, 
but is not limited to, the use of the MPMOD method to design proteins that may be beneficial 
as a diagnostic reagents, research reagents, pesticides or herbicides. 

[0067] The program (MPMOD) is an efficient method to generate disulfide-bonded 
conformers. It takes about 10-20 CPU minutes to obtain 4000 disulfide bonded conformers 
CXXC using a Linux system on a Pentium m 450 MHz. Because the conformer CXC has 
higher probability of collision, it takes about 3 times more CPU time than to generate the 
CXXC. However, the consumed CPU time strongly depended on the criteria used to generate 
the conformer. 
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{0001] The basic MPMOD program comprises the following steps. The input 
parameters, such as the peptide sequence and disulfide bond connectivity, are loaded, then the 
conformational angles (<}>, \y, a>) are generated in the four maps. The atoms of main chain and 
side chain are generated base on the angles. The van der Waals checks are performed 
separately for the backbone atoms and side chain atoms. If there is a Van der Waals violation, 
the conformer will be rejected. It will go back to get another set of conformational angles 
until the peptide is finished without any atom collisions. Then the coordinates of peptide are 
recorded and the solvent accessible surface (SAS) based energy is calculated. The disulfide 
bond is modeled to see if there is a disulfide bond is possible for the two residue pairs. If a 
disulfide bond is possible, the SAS energy for this conformer is calculated. If a disulfide bond 
is not possible, another set of conformational angles is tried and the procedure is repeated 
until a conformer with a disulfide bond is obtained. Finally, the SAS energy is calculated for 
this conformer again. 

[0069] MPMOD can be used to generate disulfide bonded conformers and/or linear 
conformers. To generate linear conformers in conduction with disulfide bonded conformers 
MPMOD is linked to the COREX algorithm (Hisler & Freire, 1996). 

in. Prokaryotic Peptide Display 

[0070] Molecular analysis of naturally occurring and artificial protein libraries has 
been greatly improved by the development of various "display" methodologies. The general 
scheme behind display techniques is the advantageous expression of peptides, and their 
disposition on some biological surface (phage, cell, etc.). The ability of different versions of 
the displaying organism to present millions and millions of different variants allows the rapid 
screening of the corresponding library for biological function. 

[0071] In U.S. Patent 5,821,047, monovalent phage display is described. This 
method provides for the selection of novel proteins, and variants thereof. The method 
comprises fusing a gene encoding a protein of interest to the carboxy terminal domain of the 
gene HI coat protein of the filamentous phage Ml 3. The fusion is mutated to form a library 
of structurally related fusion proteins that are expressed in low quantity on the surface of 
phagemid candidates. 

[0072] U.S. Patent 5,571,698 describes directed evolution using an M13 phagemid 
system. A protein is expression as a fusion with the Ml 3 gene m protein. Successive rounds 
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of mutagenesis are performed, each time selecting for improved biological function, e.g., 
binding of a protein to a cognate binding partner. 

[0073] Heterodimer phage libraries are described in U.S. Patent 5,759,817. 
Filamentous phage comprising a matrix of cpVm proteins encapsulating a genome encoding 
first and second polypeptides of an autogenously assembling receptor, such as an antibody, 
are provided. The receptor is surface-integrated into the phage coat matrix via the cpVIII 
membrane anchor, presenting the receptor for biological assessment. 

[0074] Another system, lambdoid phage, also can be used for display purposes. In 
U.S. Patent 5,672,024, lambdoid phage comprising a matrix of proteins encapsulating a 
genome encoding first and second polypeptides of an autogenously assembling receptor are 
prepared. The surface-integrated receptor is available on the surface on the phage for 
characterization. 

(0075] Immunoglobulin heavy chain libraries are displayed by phage as described in 
U.S. Patent 5,824,520. A single chain antibody library is generated by creating highly 
divergent, synthetic hypervariable regions, followed by phage display and selection. The 
resulting antibodies were used to inhibit intracellular enzyme activity. Another patent 
describing antibody display is U.S. Patent 5,922,545. 

[0076] Another example of phage display can be found in U.S. Patent 5,780,279. 
This method provides for the identification and selection of novel substrates for enzymes. 
The method comprises constructing a gene fusion comprising DNA encoding a polypeptide 
fused to a DNA encoding a substrate peptide, which in turn is fused to DNA encoding at least 
a portion of a phage coat protein. The DNA encoding the substrate peptide is mutated at one 
or more codons, thereby generating a family of mutants. The fusion protein is expressed on 
the surface of the phagemid particle and subjected to chemical or enzymatic modification of 
the substrate peptide. Those phagemid particles that have been modified are then separated 
from those that have not. 

[0077] Bacteria also have been used successfully to display proteins. U.S. Patent 
5,348,867, describes expression of proteins on bacterial surfaces. The compositions and 
methods provide stable, surface-expressed polypeptide from recombinant gram-negative 
bacterial cell hosts. A tripartite chimeric gene and its related recombinant vector include 
separate DNA sequences for directing or targeting and translocating a desired gene product 
from a cell periplasm to the external cell surface. A wide range of polypeptides may be 
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efficiently surface expressed using this system. See also, U.S. Patents 5,508,192 and 
5,866,344. 

[0078] U.S. Patent 5,500,353 describes another bacterial display system. Bacteria 
(e.g., Caidobacter) having a S-layer modified such that the bacterium S-layer protein gene 
contains one or more in-frame fusions coding for one or more heterologous peptides or 
polypeptides is described. The proteins are expressed on the surface of the bacterium, which 
may advantageously be cultured as a film. 

IV. Mutagenesis 

[0079] Where employed, mutagenesis will be accomplished by a variety of standard, 
mutagenic procedures. Mutation is the process whereby changes occur in the quantity or 
structure of an organism. Mutation can involve modification of the nucleotide sequence of a 
single gene, blocks of genes or whole chromosome. Changes in single genes may be the 
consequence of point mutations that involve the removal, addition or substitution of a single 
nucleotide base within a DNA sequence, or they may be the consequence of changes 
involving the insertion or deletion of large numbers of nucleotides. 

[0080] Mutations can arise spontaneously as a result of events such as errors in the 
fidelity of DNA replication or the movement of transposable genetic elements (transposons) 
within the genome. They also are induced following exposure to chemical or physical 
mutagens. Such mutation-inducing agents include ionizing radiations, ultraviolet light and a 
diverse array of chemical such as alkylating agents and polycyclic aromatic hydrocarbons all 
of which are capable of interacting either directly or indirectly (generally following some 
metabolic biotransformations) with nucleic acids. The DNA lesions induced by such 
environmental agents may lead to modifications of base sequence when the affected DNA is 
replicated or repaired and thus to a mutation. Mutation also can be site-directed through the 
use of particular targeting methods. 

[0081] Structure-guided site-specific mutagenesis represents a powerful tool for the 
dissection and engineering of protein-ligand interactions (Wells et aL, 1996). The technique 
provides for the preparation and testing of sequence variants by introducing one or more 
nucleotide sequence changes into a selected DNA. 

[0082] Site-specific mutagenesis uses specific oligonucleotide sequences that encode 
the DNA sequence of the desired mutation, as well as a sufficient number of adjacent, 
unmodified nucleotides. In this way, a primer sequence is provided with sufficient size and 
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complexity to form a stable duplex on both sides of the deletion junction being traversed. A 
primer of about 17 to 25 nucleotides in length is preferred, with about 5 to 10 residues on 
both sides of the junction of the sequence being altered. 

[0083] The technique typically employs a bacteriophage vector that exists in both a 
single-stranded and double-stranded form. Vectors useful in site-directed mutagenesis 
include vectors such as the Ml 3 phage. These phage vectors are commercially available and 
their use is generally well known to those skilled in the art. Double-stranded plasmids are 
also routinely employed in site-directed mutagenesis, which eliminates the step of 
transferring the gene of interest from a phage to a plasmid. 

[0084] In general, one first obtains a single-stranded vector, or melts two strands of a 
double-stranded vector, which includes within its sequence a DNA sequence encoding the 
desired protein or genetic element. An oligonucleotide primer bearing the desired mutated 
sequence, synthetically prepared, is then annealed with the single-stranded DNA preparation, 
taking into account the degree of mismatch when selecting hybridization conditions. The 
hybridized product is subjected to DNA polymerizing enzymes such as E. coli polymerase I 
(Klenow fragment) in order to complete the synthesis of the mutation-bearing strand. Thus, a 
heteroduplex is formed, wherein one strand encodes the original non-mutated sequence, and 
the second strand bears the desired mutation. This heteroduplex vector is then used to 
transform appropriate host cells, such as E. coli cells, and clones are selected that include 
recombinant vectors bearing the mutated sequence arrangement. 

[0085] Other methods of site-directed mutagenesis are disclosed in U.S. Patents 
5,220,007; 5,284,760; 5,354,670; 5,366,878; 5,389,514; 5,635,377; and 5,789,166. 
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V. Modified Polynucleotides and Polypeptides 

[0086] Amino acid substitutions are generally based on the relative similarity of the 
amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, 
size, and/or the like. An analysis of the size, shape and/or type of the amino acid side-chain 
substituents reveals that arginine, lysine and/or histidine are all positively charged residues; 
that alanine, glycine and/or serine are all a similar size; and/or that phenylalanine, tryptophan 
and/or tyrosine all have a generally similar shape. Therefore, based upon these 
considerations, arginine, lysine and/or histidine; alanine, glycine and/or serine; and/or 
phenylalanine, tryptophan and/or tyrosine; are defined herein as biologically functional 
equivalents. 

[0087] To effect more quantitative changes, the hydropathic index of amino acids 
may be considered. Each amino acid has been assigned a hydropathic index on the basis of 
their hydrophobicity and/or charge characteristics, these are: isoleucine (+4.5); valine (+4.2); 
leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine 
(+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1-3); 
proline (-1.6); histidine (-3.2); glutamate (-3.5); glutamine (-3.5); aspartate (-3.5); asparagine 
(-3.5); lysine (-3.9); and/or arginine (-4.5). 

[0088] The importance of the hydropathic amino acid index in conferring interactive 
biological function on a protein is generally understood in the art (Kyte & Doolittle, 1982, 
incorporated herein by reference). It is known that certain amino acids may be substituted for 
other amino acids having a similar hydropathic index and/or score and/or still retain a similar 
biological activity. In making changes based upon the hydropathic index, the substitution of 
amino acids whose hydropathic indices are within ±2 is preferred, those which are within ±1 
are particularly preferred, and/or those within ±0.5 are even more particularly preferred. 

[0089] It also is understood in the art that the substitution of like amino acids can be 
made effectively on the basis of hydrophilicity, particularly where the biological functional 
equivalent protein and/or peptide thereby created is intended for use in immunological 
embodiments, as in certain embodiments of the present invention. U.S. Patent 4,554,101, 
incorporated herein by reference, states that the greatest local average hydrophilicity of a 
protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with its 
immunogenicity and/or antigenicity, i.e., with a biological property of the protein. 

[0090] As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have 
been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ± 1); 
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glutamate (+3.0 ± 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); 
threonine (-0.4); proline (-0.5 ± 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine 
(-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); 
tryptophan (-3.4). In making changes based upon similar hydrophilicity values, the 
substitution of amino acids whose hydrophilicity values are within ±2 are preferred, those 
that are within +1 are particularly preferred, and/or those within ±0.5 are even more 
particularly preferred. 

VI. Mimetics 

[0091] The present inventors contemplate that structurally similar compounds may be 
formulated to mimic the key portions of peptide or polypeptides. Such compounds may be 
termed peptidomimetics. 

[0092] Certain mimetics that mimic elements of protein secondary and tertiary 
structure are described in Johnson et aL (1993). The underlying rationale behind the use of 
peptide mimetics is that the peptide backbone of proteins exists chiefly to orient amino acid 
side chains in such a way as to facilitate molecular interactions, such as those of antibody 
and/or antigen. A peptide mimetic is thus designed to permit molecular interactions similar 
to the natural molecule. 

[0093] Some successful applications of the peptide mimetic concept have focused on 
mimetics of p-tums within proteins, which are known to be highly antigenic. As discussed 
herein, possible p-turn structure within a polypeptide can be predicted by computer-based 
algorithms. Once the component amino acids of the turn are determined, mimetics can be 
constructed to achieve a similar spatial orientation of the essential elements of the amino acid 
side chains. 

[0094] Other approaches have focused on the use of small, multidisulfide-containing 
proteins as attractive structural templates for producing biologically active conformations that 
mimic the binding sites of large proteins. Vita et aL (1998). A structural motif that appears 
to be evolutionarily conserved in certain toxins is a small (30-40 amino acids), stable, and 
highly permissive for mutation motif. This motif is composed of a beta sheet and an alpha 
helix bridged in the interior core by three disulfides. 

[0095] Beta II turns have been mimicked successfully using cyclic L-pentapeptides 
and those with D-amino acids. Weisshoff et aL (1999). Also, Johannesson et aL (1999) 
report on bicyclic tripeptides with reverse turn inducing properties. 
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10096] Methods for generating specific structures have been disclosed in the art. For 
example, alpha-helix mimetics are disclosed in U.S. Patents 5,446,128; 5,710,245; 5,840,833; 
and 5,859,184. Theses structures render the peptide or protein more thermally stable, also 
increase resistance to proteolytic degradation. Six, seven, eleven, twelve, thirteen and 
fourteen membered ring structures are disclosed. 

[0097] Methods for generating conformationally restricted beta turns and beta bulges 
are described, for example, in U.S. Patents 5,440,013; 5,618,914; and 5,670,155. Beta-turns 
permit changed side substituents without having changes in corresponding backbone 
conformation, and have appropriate termini for incorporation into peptides by standard 
synthesis procedures. Other types of mimetic turns include reverse and gamma turns. 
Reverse turn mimetics are disclosed in U.S. Patents 5,475,085 and 5,929,237, and gamma 
turn mimetics are described in U.S. Patents 5,672,681 and 5,674,976. 

VH. EXAMPLES 

[00981 Th e following examples are included to demonstrate preferred embodiments 

of the invention. It should be appreciated by those skilled in the art that the techniques 

disclosed in the examples which follow represent techniques discovered by the inventor to 

function well in the practice of the invention, and thus can be considered to constitute 

preferred modes for its practice. However, those of skill in the art should, in light of the 

present disclosure, appreciate that many changes can be made in the specific embodiments 

which are disclosed and still obtain a like or similar result without departing from the 

concept, spirit and scope of the invention. More specifically, it will be apparent that certain 

agents that are both chemically and physiologically related may be substituted for the agents 

described herein while the same or similar results would be achieved. All such similar 

substitutes and modifications apparent to those skilled in the art are deemed to be within the 

spirit, scope and concept of the invention as defined by the appended claims. 

Example 1 
Construction of a polypeptide 

[0099] A unit peptide was generated using a rotational matrix [M] Q x^v (Jeffreys & 
Jeffreys, 1950) for the effect of a rotation on the coordinates of a point by an angle 0 about an 
axis through the origin having the direction cosines X, \i, v. 
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cos 0 + /L 2 (1 - cos 0) Zju(l - cos 0) — v sin 0 h v(l - cos 0) + // sin 0 
— cos 0) + v sin 0 cos 0 + // 2 (1 — cos 0) ju v(l - cos #) — A sin 0 
Av(l-cos0)-//sin# //v(l-cos0) + /tsin# cos0 + v 2 (l-cos0) 
Eq.l 

[0100] Giving the coordinates of three successive atoms A, B and C, the three 
components X, \i, v in the matrix were determined. The fourth atom X can be generated based 
on this matrix and the dihedral angle x(A-B-C-X), the bond angle a(B-C-X) and the bond 
length d(C-X). FIG.l gives a diagram for building the fourth atom X. 

[0101] For the coordinate used in FIG. 1, the position vectors of the three atoms A, B, 
C are rl, r2, r3. The vectors p, q and v are respectively p=r2-rl, q=r3-r2, v=d*(q/|q|), 
where d is the OX bond length. The unit vector n=(pxq)/|pxq| is normal to the plane A-B-C 
formed by atom A, B and C. X0 is the temporary position in the q direction. The position of 
atom is first rotated to the X position with the rotational axis n and rotational angle tc-oc, 
where a is the bond angle of atom B, C and X. The final position of X is obtained by a 
rotation with the axis q and the dihedral angle % (one of the three dihedral angles <|>, v|/ and co). 
Both the rotation for n—a and % are clockwise looking down the relevant vectors. The angles 
are positive if they are clockwise rotation and negative if anti-clockwise rotation. The final 
position of the atom X is expressed as 

r x = r c + [M]\[M]^ n v. Eq. 2 

where the notation [M] e a has been used for the matrix [M]\ jJtljV of Eq. (1) with a as a 
unit vector in the direction of k, \x, v. 

[0100] Starting from a unit peptide, all of other atoms of a polypeptide can be 

determined by Eq. (2). The backbone atoms (N, Cot, N, O, HN, HCa) were generated 

making use of the conformational angles (<|> 5 \|/, co) and the standard bond angles and bond 

lengths (Momany et ah, 1975) are listed in Table 1. The backbone atoms on residue i are 

built by the following parameters in braces; atom Nj by {Nm , Com 5 Cm , <|>m , a(C<Xi-i-Ci-i- 

NO, d(C M -Ni)}; Caj by {Cccm , Cm , N| , to M , a(CM-N r Cai), d(N r Cai)}; Q by {C M , N* , 

Caj , , a(Ni-Coi-Q), d(Ca r Q)}; Oi by {Cm , N, , Ca { , , a(Ca^Q-O0 3 d(Ci-Oi)} and 

HNi by {Cocm, Ci_i, Ni , a(CaM-Ci-i-NO* d(NHH0}, where a is the bond angle formed by the 

three atoms in the parenthesis and d is the bond length of two atoms. C(3 and HCa are treated 

specially because of the tetrahedral geometry with N and C atoms. Both atoms do not depend 
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on the dihedral angles (<|>, <o). Of the two possible positions of the CP atom, the one 
corresponding to the L-amino acid residues has been used throughout the studies. Hence, CPi 
atom of residue i is built by {Nj, Co*, Q, 109.5°, d(C<Xi - CP* )} and HCa* by {N { 9 Cot, , Q , 
-109.5°, d(Caj - HCaO}. FIG. 2 shows the diagram for two successive unit peptides. 

[0103] The side chain atoms of a polypeptide were built in the same way as shown for 
building the backbone atoms. The bond lengths and bond angles were taken from the 
published values (Momany et ah 1975). There are maximally four types of dihedral angles 
(Xl, %2, %3, %4) for the 20 amino acids. Surveys of crystallographic structures of proteins 
and small peptides show that the % angles are highly favored (Janin et aL, 1978, Benedetti et 
aL, 1983). Ponder and Richards (1987) studied the population of the % angles using 2273 
residues obtained from 19 protein crystal structures with a resolution higher than 1.8 A and R- 
factor below 0.18. Ponder and Richards found that the population of the % angles was much 
higher in some preferred values and that the standard deviation was smaller than previously 
published values. This indicates that a rotamer library can approximately represent the 
behavior of a side chain. The rotamer library by Ponder and Richards (1987) was used for all 
the dihedral angles. In order to reduce bias to as little as possible for addition of the rotamers, 
the rotamers were selected that did not collide with the backbone atoms for each residue. 
One set of rotamers was picked randomly from all the residues. However, this set of rotamers 
may have a van der Waals contact violation. Three random tries were given to increase the 
chance of getting a suitable set rotamers. If none of the three tries satisfies the van der Waals 
check, the backbone is rejected and the procedure is repeated. 

Example 2 

Determination of the dihedral angles (<|>, \|/, <o) 

[0100] The subroutine ran2.f from "Numerical Recipes"(Press et ah, 1986) was used 
to generate random numbers that uniformly distribute in the range from 0.0 to 1.0. These 
random values have no sequential correlation and the period is practically infinite. The 
conformational angles ((|>, \\f) are assigned to be ax + b where x is the random value 0<x<l .0, 
a and b are two constants which are adjusted so that the two angles are restricted to the 
allowed regions of the Ramachandran plots. 
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TABLE 1 



Bond angle (°) 


Bond length (A) 


CttM-Q-i-Ni 


115.0 


Ci-i-Ni 


1.325' 


Ci-i-Nj-Cai 


121.0 


Ni-Caj 


1.453 


Cj-Caj-Ni 


109+5 


Q-Ca, 


1.530 


Caj-Q-Oi 


120.5 


Q-Oj 


1.230 


Ni-Cai-CPi 


109.5 


Caj-Cp, 


1.530 


Ni-Ca,-Hi 


109.5 


Caj-Hj 


1.020 




121.0 


Ni-Hj 


0.940 



Table 1. The bond lengths and bond angles used for building polypeptide. The bond angle C- 
Coc-N is not as rigid as other bond angles. Therefore it is allowed to vary ±5° around the 
average value 109°. 

* If the residue i is Pro the bond length of Q-i-Ni is 1.355A. 

[0105] In order to sample the conformational space efficiently, the backbone dihedral 
angles ((}>, of a protein are divided into four categories, one for glycine, one for proline, 
one for the CB-branched amino acids (VAL, ELE and THR), and one for all other amino 
acids. Glycine, with one hydrogen atom as its side chain, can adopt a wide range of 
conformations and the map is symmetrical due to absence of an R substitute on the alpha 
carbon (Ca). Proline only adopts a very narrow range of conformation space because of the 
pyrrolidine ring attached to the N and Ca atoms, which restrains the conformation greatly. 
Alanine is a prototype L-amino acid whose conformational space can approximately 
represent that of other amino acids except for proline and glycine. However, due to the two 
branches on the CB atoms, the amino acids (VAL, ALE and THR) have more restricted 
conformational space than ALA (Scheraga, 1992; Chakrabarti & Pal, 1998). FIG. 3 shows 
distribution of the conformational angles on the four maps. As the number of the random 
values becomes sufficiently large, the points will evenly distribute in the allowed regions. 

[0106] MacArthur & Thornton (1993) made an analysis of the conformational angles 
for proline residues from non-homologous X-ray crystal structures determined to 2.5A or 
better. They showed that majority of the conformation angles are clustered about the mean 
values of <)>, vj/= -61°, -35° for the a region and (j>, -65°, 150° for the p region. The early 
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computations (e.g\, Ramachandran et al, 1963; Nemethy and Scheraga, 1965) with hard- 
sphere potential, a good first order approximation, showed that the region around the vy=0° 
was not allowed. When realistic potential was used, this region became partially allowed. 
Based on this consideration, the inventors divided the map into the favored regions which are 
bounded by solid lines and less-favored regions which are bounded by dashed lines. It was 
determined to give the less-favored regions 60-80% of a chance of occurrence less than the 
favored regions. For all amino acids except glycine and proline, only 5% of random values 
were assigned to the <|> positive regions. This conformation assignment is similar to the one 
proposed by Sowdhamini et ah (1993) with differences of the following two aspects: 1) the 
inventors sampled conformational space to be closer to the Ramachandran plots and two 
small areas were added to the positive <j> region and 2) the angle distribution for each map is 
''non-even". These aspects speed up the modeling of the peptide significantly. 

[0107] The trans and cis forms of the peptides have dihedral angles of ©=180° and 0°. 
The non-Pro amino acids are favored in trans form by a ratio of approximately 1000:1. With 
proline, the trans form is only favored by 4:1. Therefore the non-Pro amino acids were set as 
100% trans form and the proline was given up to 5% (or optional) cis form and 95% trans 
form. The dihedral angle co is also allowed to have a fluctuation about 5° around the value 
180 or 0°. 

Example 3 
van der Waals steric contacts 

[0108] The hard sphere atom model was assumed as the scoring function to eliminate 
grossly improbable conformers. Ramachandran (1963) used X-ray data to determine a list of 
contact distances for each kind of atom pair (see the "normal" and "extreme" distances of 
Table 2.) occurring in proteins. These distances are about 0.3 ~ 0.5 A smaller than the 
summation of the van der Waals radii of two atoms (Gavezzotti, 1983). Gavezzoti concluded 
that the structure was less stable for the distance of "extreme limit " than for the distance of 
"normal limit". However, the short contact distances are usually in the extreme limit if there 
are hydrogen bonds or other attractive effects. Iijima et al (1987) calibrated the van der Waals 
radii of atoms using an inverse Ramachandran plot. The calibrations were based on the 
comparison of the Ramachandran plots obtained from high resolution X-ray data of proteins 
and peptides with the allowed conformational space for the di-peptide molecular models built 
from the published standard bond angles and bond lengths. The calibrated contact distances 
for each atom pair are about 0.1 to 0.2A shorter than the "extreme limit" (Table 2). 
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[0109] Out of the three kinds of the short contact distances, both the "extreme limit" 
and the "calibrated" distance were used for the van der Waals check, unless otherwise 
specified. The "extreme" distances were used only for the backbone atom pairs checked. The 
"calibrated" distances were used for the side chain to side chain or side chain to backbone 
atom pair check. The reasons to use two contact distances were the following: (1) to give 
some flexibility for the backbone and slightly more flexibility for the side chains, (2) to 
compensate for the inaccuracy that is caused by the fixed geometrical parameters used to 
build the polypeptides, (3) to include hydrogen bonds or some attractive features in the 
conformer. (4) to save computing time especially when the side chain atom pairs are involved 
in van der Waals checks. It should be noted that for each atom, all the possible non-bonded 
atom pairs are checked. Atom pairs in the same residue are not checked. 

TABLE 2 





Normal (A) 


Extreme (A) 


Calibrated (A) 


H...H 


2.0 


1.9 


1.9 


H...O 


2.4 


2.2 


2.2 


H...N 


2.4 


2.2 


2.3 


H...C 


2.4 


2.2 


2.3 


0...0 


2.7 


2.6 


2.4 


O...N 


2.7 


2.6 


2.47 


O...C 


2.8 


2.7 


2.55 


N...N 


2.7 


2.6 


2.54 


N...C 


2.9 


2.8 


2.62 


C...C 


3.0 


2.9 


2.7 


H...S 


2.5 


2.3 


2.3 


O...S 


2.9 


2.8 


2.7 


N...S 


2.95 


2.85 


2.77 


C.S 


3.05 


2.95 


2.85 


S...S 


3.1 


3.0 


3.0 



Table 2 The short contact distances between each kind of atom pair. The normal and 
extreme distance is from Ramachandran et al, (1963). The calibrated distance is from Iijima 
etal, (1987). 
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Example 4 
Modeling disulfide bonds 

[0100] The present invention includes a flexible method to search for the potentially 
existing disulfide bonds in a structure. For the specified cysteine i, the coordinates of the 
sulfur atom S 1 are generated using the torsion angle x 1 (N-Coc-Cp-S) and the coordinates of 
atoms Nj, Coti and Cpj. The positions of S 1 atoms must be on a circle which is formed by 
rotating about the Coc'-Cp 1 bond with the rotational angle %\ Statistics shows that the favored 
dihedral angles x 1 are around -60, 60 and 180°. In the present invention, a wider region 
around each angle was scanned z.e., * s from -20 to —100, 20 to 100 and 140 to 220. The 
coordinates of S l are recorded every four degrees when rotating about the Ca'-Cp 1 bond. The 
same procedure is applied to the specified residue j. The distances are checked for all the 
generated atom pairs S 1 and S J on both circles. FIG.4 shows one of the generated sulfur pairs. 
If the distance between S* and S j is within 2.04±0.4A and the bond angle Cp-S-S* and Cp j -S j - 
S 1 * within 104±5° and the dihedral angle -x M (Cp l -S i -S j -Cp i ) within |90|±40°, it was assumed 
that these two cysteines could form a disulfide bond. To ensure that the generated S atom 
positions are in good geometry with all other atoms of the conformer, a van der Waals check 
was also performed. This rejects many position pairs, especially when Ca'-Cp 1 bond of the 
first Cys is approximately in line with Ccc*-Cp j of the second Cys residue. The best position 
pair was then selected. 

[0111] To test the procedure for predicting the disulfide bond, 19 disulfide bonds 
were examined that were not successfully modeled by Sowdhamini et aL, (1989). Table 3 
lists the data from the modeled and crystallographically observed disulfide bonds. All the 
disulfide bonds were successfully predicted (Table 3). However, two disulfide bonds have to 
be modeled by adjusting the criterion of the torsion angle x S S to be beyond |90j±40°. 
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TABLE 3 



Protein 


d(S- 
S) 


^ 


A, 




X 


X 


L^p -o - 

s j 


t^p -b -2> 


Azurin 


1.98 


-82 


-65 


-57 


-69 


-72 


103 


102 


(2AZA) 3B-26B 


1.95 


-87 


-66 


-60 


-66 


-70 


101 


102 


Carboxypeptidase 


1.89 


93 


-79 


-74 


144 


-57 


103 


105 


(5CPA) 138-161 


1.93 


92 


-84 


-78 


138 


-47 


109 


99 


Lysozyme (HEW) 


2.11 


80 


-35 


-178 


148 


42 


105 


104 


(6LYZ) 76-94 




70 


AO 


-IoU 




A A 

44 


1 A 1 
101 


107 


Lysozyme (Human) 


2.08 


95 


62.6 


-71 


81 


-58 


103 


107 


(1LZ1) 65-81 


2.30 


94 


62 


-66 


82 


-58 


104 


102 ! 


Ovomycoid. third 
domain 


1.99 


99 


-179 


-70 


65 


-47 


105 


104 I 


(20VO) 24-56 


1 99 


105 


-1 77 

Lit 


"UU 




-JJ 


1 CiA 


1UJ 


Rat mast cell 
protease 


2.02 


-88 


-81 


-71 


-153 


-86 


103 


108 


(3RP2) 42A-58A 


2.20 


-86 


-84 


-66 


-155 


-85 


100 


106 i 


168A-182A 


1.99 


159 


-52 


-92 


82 


-50 


103 


113 




2.02 


168 


-48 


-81 


94 


-73 


101 


106 . 


Glutathione reductase 


2.06 


-133 


178 


-32 


80 


118 


101 


92 


(3GRS) 58-63 


2.04 


-136 


178 


-31 


78 


120 


100 


92 i 


Phospholipase A-2 


2.0o 


1 no 

108 


-64 


176 


-54 


72 


101 


100 


(1BP2) 61-91 


2.01 


110 


-64 


178 


-56 


71 


101 


100 


b-Trypsin 


1.98 


-77 


-98 


-78 


-139 


-82 


105 


109 


(1TPP) 42-58 


2.16 


-72 


-108 


-70 


-144 


-81 


103 


108 
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TABLE 3 



Protein 


d(S- 
S) 


X 


x"' 






XT 


Cp'-S'- 
S j 


cp-s^s' 


b-Trypsin complex 


2.03 


-86 


-82 


-73 


-142 


-80 


107 


106 


(2PTC) 42E-58E 


2.06 


-85 


-82 


-73 


-142 


-79 


105 


106 


Immunoglobulin 
FAB 


2.01 


84 


171 


60 


110 


164 


105 


106 






OJ 


1 1A 


SA 


119 
YLZ. 


161 
I Ox 


1 CO 




Proteinase A 


2.03 


-87 


-70 


-64 


-146 


-108 


100 


104 


(2SGA) 42-58 


1.90 


-93 


-76 


-58 


-145 


-107 


107 


100 


191-220 


2.05 


101 


68 


-76 


95 


-55 


104 


105 




2.16 


93 


63 


-82 


96 


-44 


106 


105 


Wheat germ 
agglutinin 


2.00 


90 


-62 


-59 


144 


-60 


105 


109 


(3WGA) 17A-31A 


1.95 


103 


-64 


-55 


136 


-66 


104 


100 



Table 3. The first column gives the protein names, the four letters code and the 
residue pairs for forming disulfide bond. The remaining columns are the following; d(S-S) 
the distance between the two sulfurs, xS-S the torsion angle Cp ! -S ! -S J -Cp J , and x*" 1 the 
torsion angles N-Ca-Cp-S for residue i and j. % l ~ 2 and y?' 2 the torsion angles Coc-CP-S-S for 
residue i and j. Cp i -S i -S j and Cp'-S'-S 1 the bond angles. In each row the first line is the 
crystallographically observed data and the second line is the modeled the data. Note: a; In 
this case, variation of the dihedral angle Cp^-S^C^ is set by |90| ± 70°. 

Example 5 
The MPMOD program 

[0112] The present invention comprises a mini-protein modeling (MPMOD) program 
to perform a Monte Carlo search of conformational space. The idea for sampling 
Ramachandran maps was based on the program RANMOD (Sowdhamini et ah, 1993). The 
inventors used Ponder and Richard's rotamer library (1989) along with a subroutine to 
generate side chains. Part of the program was written in standard Fortran-77 and the part of 
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the program that calculates the solvent accessible surface (SAS) energy and does the 
thermodynamic analysis was written in C (Hilser & Freire, 1996). FIG. 5 gives a flow chart 
of the program that is used to model disulfide bonds. In step 100, sequence, disulfide bond 
connectivity, and other parameters are inputted. The starting data is inputted manually or it 
may be retrieved from a database that is well known and used by those of skill in the art. 
From the input data, dihedral angles (<)>, \|/, co) are randomly generated in step 101. The 
generated dihedral angles are used to generate a polypeptide in step 102. Once the 
polypeptide is generated, a Van der Waals check is performed in step 103. If the van der 
Waals check is acceptable, then the SAS-based energy of the polypeptide is calculated in step 
104. If the van der Waals check is not acceptable, then dihedral angles are regenerated. 
Next, step 105 searches for existing disulfide bonds in the generated polypeptide. After the 
disulfide bonds have been modeled, a van der Waals check is performed to ensure that the 
sulfer atom (S) is in good geometry with all the other atoms. The best position pairs are 
chosen and the SAS-based energy is calculated in step 106. 

[0100] The MPMOD program is designed to generate disulfide bonded conformers or 
generate disulfide bonded conformers and linear conformers. If the program is run only to 
generate disulfide bonded conformers, then it is considered "the fast mod", which is 
illustrated in FIG. 13A and FIG. 13B. In step 200, sequence, disulfide bond connectivity, and 
other parameters are inputted. The starting data is inputted manually or it may be retrieved 
from a database that is well known and used by those of skill in the art. From the input data, 
dihedral angles (<|>, \|/, a>) are randomly generated in step 201 and angles are assigned to each 
residue of the backbone. The generated dihedral angles are used to generate a backbone 
atoms, starting from three given atoms in step 202. The distance pairs are checked in step 
203. It is important to determine the distance between the two Ca atoms and the distance 
between the two CP atoms. If the distance is not acceptable, then the dihedral angles are 
regenerated. The distance between the cysteines (C) plays a role in the rate of loop closure. 
If the distance is acceptable, then a Van der Waals check is performed in step 204. If the van 
der Waals check is acceptable, then the rest of the backbone is generated in step 205. If the 
van der Waals check is not acceptable, then dihedral angles are regenerated. While 
generating the backbone^if the van der waals check remains acceptable, then modeling of the 
disulfide bonds is performed in step 206. If the van der waals check does not remain 
acceptable, then dihedral angles are regenerated. Next, rotamers or side chains are added to 
the backbone in step 207. Rotamers are added to each residue except for the cysteines. From 
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step 207, one can collate all the none-van der waals violations in step 208 and regenerated 
dihedral angles and in step 209 the backbone and all rotamer combinations are written to a 
file. If the van der Waals check is acceptable for each rotamer in step 210, then disulfide 
bonded pairs are checked to ensure that the sulfer atom (S) is in good geometry with all the 
other atoms in step 211. If all the checks are acceptable, then the backbone angles and other 
information are written to a file in step 212. Next, a binding test is performed in step 213 for 
each conformer with the receptor to determine which conformer has a higher binding affinity. 
Finally, the SAS-based energy is calculated in step 214. 

[0100] As mentioned the MPMOD program also can generate disulfide bonded 
conformers and linear conformers. This type of program is considered "the slow mod", 
which is illustrated in FIG. 14A and FIG. 14B. In step 300, sequence, disulfide bond 
connectivity, and other parameters are inputted. The starting data is inputted manually or it 
may be retrieved from a database that is well known and used by those of skill in the art. 
From the input data, dihedral angles ((|>, vy, ©) are randomly generated in step 301 and angles 
are assigned to each residue of the backbone. The generated dihedral angles are used to 
generate a backbone atoms, starting from three given atoms in step 302. Next, the rest of the 
backbone is generated in step 303. If the van der Waals check is acceptable, rotamers or side 
chains are added to the backbone in step 304. Rotamers are added to each residue. After the 
rotamers are added, the distance pairs are checked, modeling of the disulfide bonds and van 
der Waals check for the SS pairs with the complete conformer in step 305. If any step in 305 
is unacceptable, the number of the conformer that can not form a SS bond is recorded and the 
program is linked to the COREX program to calculate the SAS-based energy AG for each 
conformer in step 308. If all steps in step 305 are acceptable, then the number of the 
conformer SS bond is recorded, the SAS-based energy AG for each conformer is calculated in 
step 306. After the calculations, each conformer is written to a file in step 307. 

[0115] Yet further, the MPMOD program is capable of performing loop generation as 
shown in FIG. 15. In step 400, two residue numbers of the flexible loop of the protein and 
the accuracy are inputted. The starting data is inputted manually or it may be retrieved from 
a database that is well known and used by those of skill in the art. From the input data, 
dihedral angles ((j>, \y, a>) are randomly generated in step 401 and angles are assigned to each 
residue of the backbone. The generated dihedral angles are used to generate a backbone 
atoms or mainchain atoms. The distance pairs are checked in step 403. It is important to 
determine the distance between the two Ca atoms and N and C terminals of the conformer. 
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In step 404, the distance between N and C terminal of the conformer is minimized by altering 
or modifying the dihedral angles. Step 405 requires that the handness of the conformer be 
same as the cutting parts of the target protein. Van der Waals check is performed in step 406 
of the mainchain atom pairs. If the van der Waals check is acceptable, then the confoimers 
are aligned to the target protein in step 407. Van der Waals check is performed in step 408 of 
the mainchain and target protein atom pairs. If it is acceptable, then rotamers or side chains 
are added to the mainchain in step 409. If the van der waals check is acceptable for each 
rotamer in step 409, then information is written to a file in step 410. 

[0100] For the disulfide bond modeling module of the MPMOD program is illustrated 
in FIG. 16. In step 500, coordinates of N, Cot and Cp of the two cysteines are obtained. 
Next, in step 501, a distance check is performed for Ca to Ca and Cb to Cb. If the distance is 
not acceptable, then other coordinates must be obtained in step 500. If the distance is 
acceptable, then the SG is generated on the circle formed by the rotation along Ca-Cp bond in 
step 502. Next, bond length, bond angle, and dihedral angles are determined in step 503. If 
the measurements in step 503 are acceptable, the disulfide bond is formed and the coordinates 
are written to a file in step 504. 

[0117] The binding test module of the MPMOD program is illustrated in FIG. 17. 
Step 600, requires pdb coordinates of the generated conformers and the crystal structure, 
segment of sequence for both alignments, criteria for best alignments, and three options for 
test "binding". Once all the information is gathered, the conformer is aligned to the 
corresponding peptide crystal structure in step 601. Next the root mean square deviation 
between each modeled conformer and the target peptide is determined and the average of 
conformational angle difference between each residue of the two conformers is determined. 
If the values are acceptable, then van der waals check of each conformer with the protein is 
performed in step 603. If the van der waals check is acceptable, then the SAS-base energy 
for each conformer is calculated in step 604 and the statistics are preformed in step 605. 

Example 6 

The loop closure rate for the cyclic peptides CXC and CXXC 

[0118] Conformational ensembles for a series of CXC and CXXC, where X is one of 
the amino acids Ala, Val, Pro and Gly, were generated. Each ensemble consisted of 4000 
conformers that can form a disulfide bond. The side chain was added to the backbone for 
each conformer. Only the backbone hydrogen atoms (HN and HCA) were generated in each 
conformer. 
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[0119] The disulfide bond loop closure "rate" or probability may be defined as N C /N G 
where N c is the number of conformers that can potentially form disulfide bond and N Q is the 
number of conformers that can not form a disulfide bond but have passed van der Waals 
check. This definition is similar to that of the equilibrium constant Kc. Table 4 listed the rates 
of loop closure from the modeling and experimental data for each type of peptide. The 
relative values from the modeling depict the same pattern as the experimental data. The rate 
for CPC is the highest in the CXC series and CPPC is has the lowest rate in the CXXC series. 
To directly compare both values, they were scaled to the same level. The common scale 
factor for CXC and CXXC can be defined by K=Z(Exp)/Z(Mod), where E(Exp) is the 
summation of all the experimental values, and Z(Mod) is the summation of all modeled 
values. Each individual value of the modeled conformer is multiplied by the scale factor K. 
FIG. 6 gives the comparison of scaled values of modeling with the experimental values. The 
values are in agreement for the CXXC series. 

[0120] Table 4 shows that the rates of loop closure for the CXC series are much lower 
than those for CXXC series. This is determined by two factors, the distance between the two 
cysteines and the flexibility of the backbone. Statistics shows that to form disulfide bond, the 
distance Cd-Cj between the two Cot atom of cysteine i and j must be within 4.0 to 7.0 A and 
the distance Cp'-Cp* between the two Cp atoms must be within 3.3 to 4.7 A. The inventors 
surveyed all of the conformers that passed van der Waals check and found that for the CXC 
series the average distance for the two Cos was about 6.2-6. 5A and for the two CPs the 
distance was about 7. 1-7.9 A. The distance of CPC was the shortest. The averaged Cp 
distance of the randomly generated conformers is far from the suitable distance. Due to CXC 
only having three residues, the degree of flexibility of the backbone is not high enough to 
make the Cp distance shorter unless the standard bond angles and bond lengths change. The 
ratio Ncacb/Nvdw, where Ncacb is the number of conformers that have suitable Ca and Cp 
distances and N v aw is the total number of conformer that passed van der Waals check, is 
0.72%, 0.63%, 1.45% and 0.23% for CAC, CVC, CPC and CGC respectively. For the 
CXXC series, the average distance for the two Cas and the Cps are respectively 8.4-8.7A 
and 9.1~9.6A, with CPPC being the shortest. These distances are further away from the 
standard distances for forming disulfide bonds, but the residues have a much higher degree of 
flexibility for the backbone. A higher percentage of the conformers have suitable Ca and CP 
distances. The ratio Ncacb/N V dw gives 2.19%, 2.61%, 0.53% and 1.26% respectively for 
CAAC, CWC, CPPC and CGGC. 
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[0121] To form a disulfide bond it is not just important to have a suitable Ca and C(J 
distance, but also for the two sulfurs to have good steric positions. The inventors checked the 
number of conformers that satisfied the Ca and Cp distances and the number of conformers 
that formed a disulfide bond. Although the conformers of CAC have the suitable Ca and Cp 
distances, the probability of forming a disulfide bond is still smaller than CAAC because of 
CAC lacking a set of suitable geometrical parameters such as the bond length S-S, bond angle 
CP-S-S and the torsion angle Cp-S-S-Cp. The ratio N c /N ca cb, where N c is the number of 
conformers that can form disulfide bonds, for the CAC, CVC, CPC and CGC series are 5.7%, 
4.6%, 6.0% and 8.4% and the ratio N«/N„c b for CAAC, CWC, CPPC and CGGC are 36.0%, 
35.1%, 34.3% and 33.0%. Therefore, the CXC series not only has a lower percentage of the 
conformers that have suitable Ca and Cp distances, but also have a lower percentage of the 
conformers in which the two sulfurs to have good geometrical positions to form a disulfide 
bond. These factors led to a lower probability of loop closure for CXC than for CXXC 

TABLE 4 



Amino 
Acids 


CXC 


CXXC 


Mod 


Exp 


Mod 


Exp 


Ala 


0.091 


3.1 


0.604 


71.0 


Val 


0.066 


1.6 


0.748 


71.0 


Pro 


0.129 


7.2 


0.270 


20.0 


Gly 


0.093 


1.5 


0.357 


31.0 



Table 4. The probability from modeling (labeled by Mod) is defined as N c /N c , where N c is 
the 4000 conformers that can former S-S bond and N„ is the number of conformers that do 
not form SS bond but have passed vd W check. The equilibrium constant (labeled by Exp) 
is defined as kc/ko, where kc is the loop closing and ko the loop opening rate constant (Zhang 
and Snyder, 1989). 

[0122] Comparing the probabilities of loop closure for the peptides having the same 
number of residues, the probability for CPC is the largest and for CVC is smallest for the 
CXC series. On the contrary, the probability for CPPC is the smallest and for CWC the 
largest for the series CXXC. This was caused by a special property of Pro and Val. The ratios 
of Nc/N cacb for the two series are about the same, so only the ratios of Ncacb/Nyaw are 
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significantly different. Due to the pyrrolidine ring, the proline tries to pull the two cysteines 
toward each other, so that the two cysteines of CPC have a better Cp'-Cy distance. It has a 
higher chance of forming a disulfide bond than other CXC series. In fact, the combination of 
XPX, where X is a non-proline amino acid, can easily form P turn. On the other hand, Val 
has three rotamers % x =lS0°, -60° and 60°. For the CVC peptide, the rotamer with % l =180 
tries to push the two cysteines toward each other, but the rotamers, with x 1 = -60° and 60°, try 
to push the two cysteines away from each other, because the two CB branches are almost 
perpendicular to the backbone. It is more likely that the CB branch will push the two 
cysteines away from each other. This is why CVC has the smallest loop closure probability. 
As for the conformers of CXXC, the flexibility of the backbone plays a dominant role for the 
loop closure. Since the backbone of CPPC has a much lower flexibility than other peptides in 
the CXXC series, the chance of loop closure for CPPC is also lower than for other members 
of the CXXC series. 

[0123] The inventors determined how many conformers were needed to get a 
meaningful ratio Nc/N 0 . This ratio converges to a stable value as the number of conformers in 
the ensemble is increased. FIG.7 shows the ratio changes with increasing numbers of 
conformers for each series. When there are not enough conformers in the ensemble, the 
fluctuation of the values is large. As the number of conformers is increased, the ratio Nc/N 0 
converges to a stable value. Therefore, the converged ratio may be compared with the 
experimentally measured result. From FIG.7 shows it is possible to over generate the number 
of conformers. One thousand conformers in each ensemble is enough conformers to get a 
converged ratio. The fluctuation after 1000 is not larger than 0.003% for the CXC series and 
not larger than 0.05% for the CXXC series. 

Example 7 
The longer CX n C polypeptide 

[0124] Zhang and Snyder (1989) also measured the equilibrium constant Kc for the 
series of CA n C, where n is from 1 to 5. It was found that the Kc constant decreases in the 
order of CA 2 C, CA4C, CA 3 C, CA 5 C, CAiC, with an even numbers of A* high and odd 
numbers low (see line 3 of FIG. 8). The result of the inventors modeling using only the van 
der Waals approximation does not agree with Zhang and Snyder's (1989) experimental 
results. When the inventors increase the number of alanines between the two cysteines, the 
probability as defined by N c /N c decreases monotonically after n>2 (see line 4 of FIG. 8). The 
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peak for CA4C was not captured by the modeling. This is not surprising because the 
inventor's calculation only considers geometrical factors, while the intramolecular 
interactions are more complicated in the experiment. The inventors performed a survey for 
the N-H...O hydrogen bond which was limited only to the backbone. The criteria for 
forming the H bond are 120° < 3 < 180° for the N-H...O bond angle and d<3.3A for the 
distance between N and O atoms. The ratio of numbers of hydrogen bonds in the disulfide 
bond closed conformers divided by the total number of conformer decrease in the order of 
CA 2 C, CA4C, CA 5 C, CA 3 C, CAiC, is similar to that of Kc constant (see line 1 of FIG.8 for 
ratio of H bonds). This indicates that the even numbered peptides CA n C are favored to have 
H bonds that stabilize the structure. 

[0125] The solvent accessible surface (SAS) energy AG was calculated. Since the 
hydrogen bond is not considered in calculating AG, compensation was given to the energy. 
For each H bond, the energy is increased 0.5 units. The energy weighted probability is 
defined as 



te- AG > /RT /t 

/=0 / i=0 



e AGi/RT 



which is the ratio of partition function for the closed peptides divided by the partition for all 
the unclosed peptides. This ratio follows the trend of experimental result (see line 2 of 
FIG.8). Some conformers in the ensemble that can form SS bond have a high probability, 
which leads to a high ratio. The peak of CA4C is slightly larger than CA 3 C and CA 5 C. 

Example 8 

Construction of peptides for modeling the peptide-streptavidin complex 

[0126] The backbone dihedral angles (<|>,v|/) of the peptide were randomly generated in 
the four Ramachandran maps, one for glycine, one for proline, one for the CB-branched 
amino acids (VAL, ILE and THR), and one for all other amino acids. The trans and cis forms 
of the peptides have dihedral angles of ©=180° and 0° with a small random deviation (usually 
within ±5°). The backbone of the peptide was generated based on the dihedral angles \|/, 
03) and the standard bond lengths and bond angles. Ponder and Richard's rotamer library 
(1989) was used to add side chains to the backbone. The simple hard sphere approximation 
was used to eliminate the grossly improbable conformers. Each atom was thought of as a hard 
sphere with its appropriate van der Waals radius. The minimum distances (Iijima et aL, 1987) 
between two atoms were used for the van der Waals check for each atom pair. These 
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distances are about 0.2 to 0.4A shorter than the "normal" distance of Ramachandran et al, 
(1963). If the backbone hydrogen atoms (UN and HCA) are generated, the overlap of the H 
atom with other atoms is even larger (about 0.5A) than normal. Otherwise, it would not be 
efficient to generate the conformer due to van der Waals violations. 

Example 9 

Modeling disulfide bonds for the peptide-streptavidin complex 

[01271 A single disulfide bond can be modeled using the method of the present 
invention. When two disulfide bonds are modeled in a conformer, attention should be paid to 
the computational efficiency. The probability to form two disulfide bonds simultaneously for 
a polypeptide is the product of the probabilities for each disulfide bond to form. Currently, it 
takes a long time to generate one peptide with two disulfide bonds. The inventors have 
created an efficient way to model a two-disulfide bond conformer. With two-disulfide bonded 
loops, the short one is modeled first. Conformations of this loop are fixed when the short one 
forms a disulfide bond. When the first loop is fixed, it may take long time to find the second 
loop if the first loop does not have suitable geometry. Therefore, some number of tries must 
be given to search for the second loop, while the conformation of the first loop is fixed. The 
number of tries is usually set to be between about 5 and 10. It is possible to obtain several 
polypeptides with one fixed conformation for the first loop and various conformations for the 
second loop. All conformers in the ensemble are kept for the "binding** test. 

[0128] If the polypeptide is cycled by a covalent peptide bond (i.e., the nitrogen (N) 
of the first residue makes a covalent bond with the carbon (C) of the last residue) the method 
for modeling the disulfide bond is no longer valid. The criteria to form such cyclic peptides 
are 1.35±0.6 A for the N-C bond length and 120±35° for the bond angles (CA-N-C or CA-C- 
N). It is less efficient to generate such cyclic peptides than to generate a one disulfide-bonded 
peptide, since the former is searched only by one position of the atom N or C, whereas the 
latter is searched from a number of positions of sulfur. 

Example 10 

Aligning the conformers to the binding site of streptavidin 

[0129] After ensembles of conformers were generated, the "binding" test was 
performed. The first step is to align the conformer to the template. The template is the 
peptide in the co-crystal structure complex. The second step is to screen the conformer by 
using the hard sphere potential model. For the peptide-streptavidin complex, the dominant 
binding force occurs at the HPQ sequence of the peptide, the modeled conformers were 
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aligned to the corresponding HPQ sequence of the crystal structure of the complex. Any 
higher resolution X-ray crystal structure can be used for the template. Two criteria were used 
to determine whether or not the alignment is successful. One criterion is the root mean square 
deviation (rmsd) between each modeled confonner k and the target peptide t. 

Rmsd(k,t) = [Zj=i n ((x(k,j)-x(^ )]"2/ n 

where n is the number of atoms participating the alignment (n=9 for the HPQ 
sequence). 

[0130] Another criterion is the average of conformational angle difference between 
each residue of the two conformers. 

AA(k, t) - Zj=! m ( | (|>(k, j) - <|>(t> j)| + 1 M/(k j) - M/(t, j)| )/(2*m) 

where m is number of residue for the compared sequence (m=3 for the HPQ 
sequence). 

[0131] To determine whether the alignment is acceptable or not, two common 
reference values rmsd ref and AA^f, for nnsd(k, t) and AA(k, t) respectively, are given. For the 
kth conformers in the ensemble, if rmsd(k, t)<rmsd re f and AA(k, t)<AA ref are satisfied, then 
this alignment is acceptable. If any one of the criteria is not satisfied, the alignment is 
unacceptable and the conformer will be rejected. For the HPQ sequence, the reference values 
are rmsd re f=0.50A (Three atoms Coc, C and N were used for the alignment for each residue.) 
and AA re r=50°. 

[0132] If the alignment is acceptable, a van der Waals check with streptavidin is 
performed as the second step to determine whether or not the final docking is successful. If 
there are any collisions for the atom pair of conformer and the target protein, the docking is 
not successful and the conformer is rejected. The atom radius for van der Waals check is the 
same as those mentioned before. If there is no van der Waals violation for any atom pair, the 
confonner is considered as being successfully docked into the protein. The "binding ratio" 
can be defined as the ratio Nb/N t , where Nb is the number of conformers that can be 
successfully docked into the HPQ binding pocket and N t is the total number of the 
conformers in the ensemble. The ratio correlated well with the experimentally measured 
binding affinity of the complex. 
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Example 11 
Cluster analysis of the HPQ sequence 

[0133] The inventors have surveyed the peptide-streptavidin complex. Table 5 lists 
the peptides and experimentally measured binding affinities with streptavidin. Ensembles of 
conformers for all these peptides have been generated following the above procedures. FIG. 
10 gives an example of the ensemble for the peptide of CCHPQCGMVEEC. The HPQ 
sequence of the peptide is crucial for binding so it is necessary to know what fraction of the 
modeled conformers can adopt a type-I p turn in the HPQ sequence. The crystal structure of 
CCHPQCGMVEEC (FIG. 9), determined at resolution 1.46 A, was used as the template to 
calculate that fraction. All the modeled conformers are aligned to the HPQ sequence of the 
crystal structure CCHPQCGMVEEC, using the reference values imsd re f=0.50A and 
AA re f=50°. For each conformer, if the calculated rmsd(k, t) and AA(k, t) are both less than the 
given reference values, the conformer is said to be "HPQ-like", or it is similar to the 
crystalstructure hi the HPQ sequence. In other words, the modeled HPQ sequence can adopt a 
type-I p turn. The percentage of conformers able to satisfy the criteria is listed in Table 5. 

[0134] The HPQ-like conformer for the linear peptides (around 6%) is about 2-7 
times smaller than the peptides with disulfide bond (12%-42%) (Table 5). The reason is that 
the linear peptides are not restrained in conformational space and can accept various 
conformations. Whereas, for the peptides with a disulfide bond, the configuration is 
constrained. The HPQ-like ratio for the linear peptides does not vary much. The ratio for the 
cyclic peptides varies according to the type and number of amino acids between the two 
cysteines. The only difference between the conformer AECHPQFNCIEGRK and 
AECHPQFPCIEGRK is at residue 8. But the ratios for both have a significant difference in 
which the former has a ratio of 22.4% and the latter has a ratio of 41.9%. Having a proline as 
residue 8 greatly increased the chance to form a type-I P turn for the HPQ motif. 
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TABLES. 





Observed 


Modeled 


Peptide 


(^IVIJ 
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\ /o ) 
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ZoZ 


A AA/1 

0.U04 
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1.3 


FSHPQNT" 


1 

1ZD 


A AAQ 


/r 
O 


1 A 

1.4 


AHPQFPAEIC 




a 

0.00/ 


6 


1.50 


AHPQFGAEiO 


OA/1 

204 


A AAC 

0.005 


6 


0.85 


AECHPOGPPCIEGRK z 


0.23 


/IOC 

4.35 


24.9 


11.3 


AECHPQFPCIEGRK Z 


A A^3 


1.08 


4L9 


28.7 


AECHPOFNCIEGRK^ 


r-r 

7 


0.14 


22.4 


11.3 


AECHPOFCIEGRK^ 


0.47 


2.13 


16.1 


6.6 


cvclo-CHPOFC z 


0.27 


3.70 


18.7 


16.9 


cvclo-CHPOGPPC z 


0.67 


1 Af\ 

1.49 


23.5 


12. 0 


cvclo-f AHPOFP AE)K 4 


V/. I 3 


7 fiQ 


Z 1 


Zl 


cyclo-(AHPQFGAE)K 4 


19 


0.05 


12 


7.1 


cyclo-(AQYGHFAE)K" 


>5000 


0.0002 






RCCHPQCGMVEEC 3 


1.3 


3.3 


27.8 


7.5 


RCCHPQCGMAEEC 3 


2.3 


0.45 


25.3 


7.2 


RCCHPQFEPCMGC^ 


0.33 


3.0 


19.6 


7.4 



The list of experimentally observed binding and the modeled "bind ratio" fb. 

1. Weber et al, (1992) Biochemistry 31, 9350-9354. 

2. Giebel et al, (1995) Biochemistry 34, 15340-15435. 

3. Schmidt et al, (1996) J. Mol. Biol. 255, 735-766. 

4. Zang et al, (1998) Bioorganic & Medical Chemistry Letters 8, 2327-2332. 
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[0135] In the peptides, CCHPQCGMVEEC and CCHPQCGMAEEC the first two 
cysteines are too close to each other to form a disulfide bond. The combinations of the 
disulfide bonds that can be formed are the crossed form CI -C6, C2-C12, and the nested form 
C2-C6, CI -CI 2. The crossed form has a higher percentage of HPQ-like conformers than the 
nested form. This was caused by the smaller loop. Zhang and Snyder (1989) showed the 
equilibrium constant Kc for forming CXXXC is smaller than for forming CXXXXC. The first 
loop in CCHPQC for the crossed form adopts higher ratio of type-I p turn in the HPQ 
sequence. When Ala is replaced by Val for peptides with two-loops, the fraction of HPQ-like 
conformers increase. The CB branched amino acid further limits the conformation of the 
HPQ sequence, which enhanced the ratio of HPQ-like conformers. 

Example 12 

The "binding ratio" of peptide-streptavidin complex 

[0136] The X-ray co-crystal structure shows that all of the peptides bind to 
streptavidin at the same site. The HPQ sequence is crucial for the binding of the complex. 
When the HPQ motif of the modeled conformers is similar to that of the corresponding 
crystal structure, the modeled conformer has the potential to bind with streptavidin. Each 
HPQ-like conformer is aligned to the HPQ sequence of the co-crystal structure. If the 
conformer does not have a van der Waals collision with the target protein, it is defined as a 
'"binder". The larger the fraction of "binder" in the ensemble, the higher the binding affinity 
is for the complex. The last column of Table 5 gives the percentage of "binder" in the 
ensembles. The fraction of "binder" correlates with the experimentally measured binding 
affinity for the series of peptides. The linear peptides are adopted by streptavidin at very low 
percentage (from 0.85% to 1.1%) compared with the cyclic or disulfide bonded peptides 
(from 7% to 28.7%). The measured binding affinity for the linear peptides is also much lower 
than the other peptides. This is caused by the entropy effect. The linear peptides are not 
constrained in conformational space and lose more entropy when they bind to the target 
protein. Therefore, the measured binding affinity and calculated "binder" fraction for the 
linear peptides is very low; 

[0137] The last two peptides listed in Table 5 were selected from a phage display 
library. There are two disulfide bonds in each peptide. The conformation is more restricted 
than the peptides with one disulfide. It may be reasonable to expect an even higher affinity 
than the cyclic peptides because the conformation is more restricted by the two disulfide 
bonds. The measured binding affinity is actually less than that of some of the cyclic peptides. 
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The modeled fraction of "binder" also behaves like the measured affinity. This may be 
caused by the geometry of the binding site for this system. Although the peptide is more rigid 
and has a higher fraction of HPQ-like conformers, the chance to collide with streptavidin is 
higher because the miniprotein is too large to properly fit the environment at the binding site. 
The penalty from the collision is even greater than the advantage from rigidity of the 
peptides. The number of times that each residue collided with streptavidin was counted, 
assuming that each atom on the peptide collides with streptividin only one time. FIG. 1 1 
shows the number of collisions for each residue for the two disulfide-bonded peptides. The 
second loop containing residues 7-11 (GMVEE) collides with streptavidin more often than 
other residues. 

Example 13 

The correlation of the measured binding affinity with the modeled "binding ratio" 

[0138] The difference of Gibbs free energy for ligands to bind with a protein can be 
written as AG m =-RT*Ln(Ka), where K a is the measured association constant. R is a constant 
and T is the temperature. The measured free energy is assumed to have a linear relation with 
modeled "free energy". AG f = m*AG c + b, where AG c ^RT*Ln[f b /(l-f b )J, f b is the modeled 
"binding" fraction. The slope m and the intercept b can be determined by minimizing the 
summation of the difference Res =£i=i N (AG f - AG m ) 2 , where N is the total number of the 
peptide listed in Table 5. FIG. 12 shows the correlation of the "binding ratio" with the 
observed binding constant K*. The straight line is fitted by minimizing the summation Res 
=£i=i N (AGf - AG m ) 2 . 

Example 14 

Using MPMOD to develop toxin-based inhibitors of viral entry 

[0139] Compounds are being developed to inhibit attachment and/or replication of 
alphaviruses, flaviviruses and arenaviruses. These pathogenic RNA viruses are potential 
biological weapons and are of general medical concern. Mouse brain membrane receptor 
preparations are used to select Langat virus variants that do not bind. The E protein genes of 
these variants are sequenced to find mutated regions that identify nucleotides responsible for 
binding. The recombinant protein are expressed and subjected to X-ray crystallographic 
structure determination. The cell receptor is also identified at this time by screening a cDNA 
library for binding to the Langat E protein with binding detected by immunoreactivity. 
Candidate cDNAs will be screened further to identify open reading frames. The putative 
receptor will be expressed in Sf9 cells. The cell receptor's identity will be confirmed by the 
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ability of Langat to infect transfected cells. The domain of the Langat E protein containing 
the site of receptor interaction will be overexpressed to provide material for phage display 
screening. Phage display technology is used to identify toxin-based compounds that bind 
tightly to domain ED and/or the E protein and interfere with attachment and subsequent viral 
entry into the cell. Determination of the structure of the cell receptor allows additional 
templates for phage display to be constructed. Identified compounds will be tested for anti- 
Langat activity in Vero cells, then in the mouse model by intraperitoneal and aerosol 
challenge. 

[0140] The inventors have determined that spiperone, a dopamine D2 subtype 
receptor antagonist, competes with Japanese encephalitis virus for binding to mouse brain 
MRP (membrane receptor preparations). Toxin-based anti-viral compounds are being 
designed based on families of 10-45 residue disulfide-rich conformational constrained toxins 
including apamin, tertiapin, serafotoxin and conotoxins, and the human hormone endothelin. 
Constrained peptide loops and more rigid toxin-based molecules are being used because the 
structural restraints allow the reduction of conformational entropy loss upon binding and thus 
increase the affinity of binding, extend the compound's bioavailability by reducing its 
sensitivity to proteases in the serum and increase the specificity of interaction for a single 
target by eliminating conformations that might bind to human proteins. The optimization of 
toxin analog sequences can be rationally guide by an NMR solution struture determination to 
identify the changes in conformation and dynamics. Because phage display technology is 
used, once a sequence is identified as effective as an anti-viral compound, variants can be 
quickly optimized against related Langat E proteins and envelope proteins of similar viruses. 
While the rigid scaffold of the toxins is adapted by the inhibitors, the sequences identified 
differ greatly from that of the wild-type toxin, eliminating any intrinsic toxicity. The use of 
disulfide bridged loop peptides and structured toxin-based libraries restricts the 
conformational space sampled by each sequence. 

[0141] A rational structure-based incremental approach is pursued in parallel with 
strict blind combinatorial methodology. Phage display libraries containing random octamer 
sequence constrained at their ends with a disulfide bond are prepared. Tight-binding loop 
peptides are synthesized and tested for inhibition of viral entry. The crystal structure of 
inhibitory loop peptides in complex with the E protein is determined. Using MPMOD, a 
compact folded structure is designed to stabilize the observed loop conformation. That 
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peptide is synthesized and tested for binding and inhibitory effects on viral infection. 
Binding interactions are optimized by use of a phage display library of related sequences. 

[0142] Antiviral agents are screened in vitro a cell culture assay. Monkey kidney 
Vero cell cultures are pr-treated with different concentrations of the test agent before 
infection with various dilutions of Langat virus. After cultures are infected with Langat, cells 
are overlaid with agar containing the test agent at the same concentration. Cultures are 
incubated and subsequently stained to quantify virus plaque formation in agent-treated vs. 
mock-treated cultures. Any agent that reduces virus plaque formation by 90% or greater is 
studied further. 

[0143] In vivo model studies utilize 4-week outbred NIH Swiss mice treated with the 
test agent one day before, at and on each of four day following Langat virus challenge. 
Different concentrations and routes (intraperitoneal and intranasal) of agent administration 
are examined with intraperitoneal virus challenge. Mean day of death of mice is compared 
with mock-treated mice and determined efficacy of the test agents. Any potential agent is 
tested further by its ability to protect against aerosol challenge. 
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CLAIMS 

We claim: 

1 . A computer-assisted method for use in modifying a protein, said method comprising 
the steps of: 

i. generating random conformational angles from a set of protein sequence data; 

ii. * generating a protein backbone using the conformational angles; 

iii. performing a van der Waals calculation of the protein backbone; 

iv. calculating a solvent accessible surface based energy of conformers that are 
generated in steps i-iii; 

v. modeling disulfide bonds in the protein backbone; 

vi. performing a van der Waals calculation for the disulfide bonds; 

vii. calculating a solvent accessible surface based energy of conformers that are 
generated in steps i-vi; and 

viii. creating the modified protein with structural characteristics found in the above 

steps. 

2. The method of claim 1, wherein the modified protein has increased stability. 

3. The method of claim 1, further comprising determining coordinate pairs of the 
disulfide bonds. 

4. The method of claim 1, wherein the conformational angles are (|),vj/, or ©. 

5. The method of claim 1, further comprising determining the number of conformers that 
are able to form disulfide bonds. 

6. The method of claim 1, further comprising adding rotamers to each residue in the 
protein backbone. 

7. The method of claim 1, further comprising performing a binding test for each 
conformer with a template molecule. 

8. The method of claim 1, further comprising calculating the rate of disulfide bond loop 
closure. 

9. The method of claim 8, wherein determining the rate of disulfide bond loop closure in 
the protein comprises the steps of 

i. performing a van der Waals calculation on a multiplicity of conformers of the 
protein and subtracting those conformers that can not form an intramolecular disulfide to 
yield an ensemble of N 0 sterically allowed conformers; 
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ii. analyzing the ensemble of sterically allowed conformers to yield an ensemble 
of N c conformers that are capable of forming an intramolecular disulfide bond; and 

iii. calculating the ratio N c /N 0 which represents the rate of disulfide bond loop 
closure in the peptide. 

10. A method of protein miniaturization comprising modeling a protein to have the 
necessary active site conformation using the method of claim 1 while reducing the 
total number of amino acids in the protein. 

1 1. A method of increasing binding affinity between a protein and a template molecule by 
decreasing the conformational entropy loss upon binding by the protein comprising 
the constraint of at least one loop of an unstable region of the protein in 
conformational space using the method of claim 1. 

12. A computer-readable storage medium having stored therein a software program which 
executes the steps of claim 1. 

13. A computer-readable storage medium having stored therein a software program which 
executes the steps of claim 9. 

14. A modified-protein with increased binding affinity produced by the method 
comprising the steps of: 

i. performing the steps of the method of claim 1; and 

ii. performing a binding test for each conformer with a template molecule; and 

iii. creating the modified protein using structural characteristics found in the 
above steps to increase binding affinity in the modified protein. 

15. A modified-protein produced by the method comprising the steps of: 

i. performing the steps of the method of claim 1; and 

ii. creating the modified protein using structural characteristics found in the 
above steps to increase stability of the modified protein. 

16. A computer-assisted method for use in modifying a protein comprising the steps of: 

i. generating random conformational angles in allowed region of Ramachandran 
maps from a set of protein sequence data; 

ii. generating a protein backbone using said conformational angles; 
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iii. performing van der Waals calculation for each backbone atom; 

iv. determining disulfide bonds in the protein backbone; 

v. adding a rotamer to residues in the backbone; 

vi. performing van der Waals calculation for rotamers; 

vii. performing binding test with a template protein; 

viii. calculating solvent accessible surface based energy of all conformers that are 
generated in steps i-vii; and 

ix creating the modified protein using structural characteristics identified in the 
above steps. 

17. The method of claim 16, wherein generating the protein backbone comprises 
assigning conformation angles to each residue of the backbone. 

18. The method of claim 16, further comprising generating distance pairs between atoms. 

19. The method of claim 16, further comprising determining coordinate pairs of the 
disulfide bonds. 

20. The method of claim 16, wherein the backbone atoms are N, CA, or C. 

21. The method of claim 16, wherein the conformational angles are <j>,\|/, or go. 

22. A computer-readable storage medium having stored therein a software program which 
executes the steps of claim 16. 

23. A computer-assisted method for use in modifying a protein comprising the steps of: 

i. generating random conformational angles in allowed region of Ramachandran 
maps from a set of protein sequence data; 

ii. generating a protein backbone using said conformational angles; 

iii. adding a rotamer to residues in the backbone; 

iv. determining disulfide bonds in the protein backbone; 

v. linking the method to a computer assisted program that calculates linear 
conformers; 
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v. calculating solvent accessible surface based energy of conformers that are 
generated in steps i-iv; and 

vi. creating the modified protein using structural characteristics identified in the 
above steps. 

24. The method of claim 23, wherein the computer assisted program that calculates linear 
conformers is COREX. 

25. The method of claim 23, wherein the conformational angles are (J),vj/, or co. 

26. A computer-readable storage medium having stored therein a software program which 
executes the steps of claim 23. 

27. A method for determining the rate of disulfide bond loop closure in a protein 
comprising at least one two-cysteine motif represented by C-X n -C where n is an 
integer, the method comprising the steps of: 

i. performing a van der Waals calculation on a multiplicity of conformers of the 
protein and subtracting those conformers that can not form an intramolecular disulfide to 
yield an ensemble of N 0 sterically allowed conformers; 

ii. analyzing the ensemble of sterically allowed conformers to yield an ensemble 
of N c conformers that are capable of forming an intramolecular disulfide bond; and 

iii. calculating the ratio Nc/N 0 which represents the rate of disulfide bond loop 
closure in the peptide. 

28. The method of claim 27 wherein the rate is compared to the rate of disulfide-bond 
loop closure of the protein containing at least one different two-cysteine motif. 

29. The method of claim 27 further comprising the step of generating peptide backbone 
coordinates for the C-X n -C motif from standard bond angles, bond lengths and 
dihedral angles randomly generated within the allowed regions of a Ramachandran 
map for each residue to yield the multiplicity of conformers of the protein. 

30. The method of claim 29 further comprising the step of using a side chain rotamer 
library to generate C-X n -C side chain coordinates to yield the multiplicity of 
conformers of the peptide. 
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3L The method of claim 27 wherein analyzing the sterically allowed confonners 
comprises calculating the free energy of the confonners based upon the solvent 
accessible surface area. 

32. The method of claim 31 wherein analyzing the sterically allowed confonners further 
comprises flexibly modeling the cysteine side chains. 

33. The method of claim 27 further comprising the step of weighting N c and N 0 by the 
difference in free energy (AG) between the dithiol and disulfide forms of the C-X n -C 
motif and calculating the ratio 



which represents the energy-weighted rate of disulfide loop closure in the protein. 

34. The method of claim 33 further comprising the step of identifying an ensemble of N c 
confonners of the protein that can potentially form an intramolecular disulfide bond. 

35. The method of claim 33 wherein docking the ensemble of N c confonners to a binding 
site on a template biomolecule comprises the steps of: 

i. aligning the ensemble of N c confonners to a binding site on a template 
biomolecule to yield an ensemble of aligned conformers; and 

ii. performing a van der Waals calculation on the ensemble of aligned 
conformers to yield an ensemble of N b sterically allowed conformers that bind to the template 
biomolecule. 

36. The method of claim 35 wherein docking the ensemble of N c conformers to a binding 
site on a template biomolecule comprises the steps of: 

i. aligning the ensemble of N c conformers to a binding site on a template 
biomolecule to yield an ensemble of aligned conformers; and 

ii. performing a van der Waals calculation on the ensemble of aligned 
conformers to yield an ensemble of Nb sterically allowed confonners that bind to the template 
biomolecule. 
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37. The method of claim 27 wherein the protein further comprises a plurality of two- 
cysteine motifs represented by C-X n -C wherein n is independently an integer for each 
two-cysteine motif 

38. A computer-readable storage medium having stored therein a software program which 
executes the steps of claim 27. 

39. A method for assessing the binding affinity of a protein to a template molecule, 
wherein the protein comprises at least one two-cysteine motif represented by C-X n -C 
where n is an integer, the method comprising: 

i. docking the ensemble of N c conformers to a binding site on a template 
biomolecule to yield an ensemble of N b conformers that bind the template biomolecule; and 

ii. calculating the ratio N b /N c which is indicative of the binding affinity of the 
protein for the template biomolecule. 

40. A method for assessing the binding affinity of a protein to a template molecule, 
wherein the protein comprises at least one two-cysteine motif represented by C-X n -C 
where n is an integer, the method comprising the steps of: 

i. screening a population of candidate peptides comprising at least one two- 
cysteine motif represented by C-X n -C where n is an integer to yield a plurality of candidate 
peptides that are capable of forming an intramolecular disulfide bond; and 

ii. performing the method of claim 39 on at least one candidate peptide that are 
capable of forming an intramolecular disulfide bond to assess the binding affinity of the 
candidate peptide. 

41. The method of claim 40 wherein the each candidate peptide comprises a preselected 
amino acid sequence. 

42. The method of claim 41 wherein the preselected amino acid sequence predisposes the 
peptide to form a desired secondary structure. 

43. The method of claims 42 wherein the desired secondary structure is a p-turn. 

44. A method for modifying a protein comprising the steps of: 

i. evaluating an X-ray crystal structure or a nuclear magnetic resonance solution 
structure comprising an oxidized reference peptide bound to a target molecule, the reference 
protein comprising at least one intramolecular disulfide bond, to identify at least two amino 
acids at positions favorable to intramolecular disulfide bond formation; 
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ii. substituting cysteines for the two amino acids in the reference protein to yield 
a modified protein comprising at least four cysteines; 

iii. identifying an ensemble of N c conformers of the modified protein that can 
potentially form at least two intramolecular disulfide bonds; 

iv. docking the ensemble of N c conformers to the binding site on the template 
biomolecule to yield an ensemble of Nb conformers that bind the template biomolecule; 

v. calculating die ratio Nb/N c which is indicative of the binding affinity of the 
modified protein for the template biomolecule; and 

vi. repeating steps (L)-(v.) to yield modified proteins having cysteine substitutions 
at different positions and identifying modified peptides with the highest Nb/N c ratios. 

45. The method of claim 44 wherein the identifying an ensemble step comprises the steps 
of: 

i. identifying a first conformer of the protein that are capable of forming a first 
intramolecular disulfide bond defining a first disulfide-bonded loop; 

ii. constraining the model by the first disulfide bond; and 

iii. identifying a second conformer of the protein that are capable of forming a 
second intramolecular disulfide bond defining a second longer disulfide-bonded loop. 

46. The method of claim 44 wherein a second conformer is not identified after about 5 to 
about 10 attempts to identify said conformer, the method further comprising the steps 
of: 

i. eliminating the first disulfide bond from the model; 

ii. identifying a first conformer of the peptide that can potentially form a first 
intramolecular disulfide bond defining a different first disulfide-bonded loop; 

iii. constraining the model by the first disulfide bond; and 

v. identifying a second conformer of the peptide that can potentially form a 
second intramolecular disulfide bond defining a second longer disulfide-bonded loop. 
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47. A method for assessing the binding affinity of a protein to a template molecule, 
wherein the protein comprises a flexible loop, the method comprising the steps of: 

i. generating a peptide conformation of length N from a starting residue I and 
matching to a target residue I + Non the peptide model; 

ii. accepting the loop conformation when the deviation between residue N and 
the target residue is small; 

iii. closing the loop using a geometric minimization method; 

iv. selecting the residue conformation by the method of claim 29; 

v. generating an ensemble of surface loops; and 

vi. estimating the binding affinity by testing the docking of the fiill mini-protein 
ensemble and peptide target containing the loop ensemble. 

48. A protein produced by protein miniaturization comprising modeling a protein to have 
the necessary active site conformation using the method of claim 1 while reducing the 
total number of amino acids in the protein. 

49. A protein capable of docking into a binding site wherein the conformation of a portion 
of said protein was constrained by the introduction of a disulfide bond by the method 
of claim L 

50. A protein, created by the method of claim 1, having the characteristic of inhibiting the 
binding of a virus to a cell wherein the protein is based upon a tertiary structure of a 
toxin and comprises at least one loop constrained by a disulfide. 

51. An ensemble of intramolecular disulfide bond-forming conformers of said loop from 
the protein of claim 50. 

52. A protein having decreased conformational entropy loss upon binding to a template 
molecule in comparison to the naturally occurring protein due to the constraint of at 
least one loop of an unstable region of a protein in conformational space by the 
formation of a disulfide bond other than disulfide bonds found in the naturally 
occurring protein using the method of claim 1 . 

53. An ensemble of intramolecular disulfide bond-forming conformers of said loop of the 
protein of claim 52. 

54. A protein modified by the method of claim 44. 
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55. A computer system for designing a modified-protein, said system comprising: 

ix. a database containing a set of protein sequence data; 

x. a software program coupled with said database, the software program adapted 
for performing the steps of : 

(a) generating random confonnational angles from the set of protein 
sequence data, 

(b) generating a protein backbone using the conformational angles, 

(c) performing a van der Waals calculation of the protein backbone, 

(d) calculating a solvent accessible surface based energy of conformers 
that are generated in steps (a) — (c), 

(e) modeling disulfide bonds in the protein backbone; 

(f) performing a van der Waals calculation for the disulfide bonds; 

(g) calculating a solvent accessible surface based energy of conformers 
that are generated in steps (a) — (f) ; and 

(h) creating the modified protein with structural characteristics found 
in the above steps. 

56. A computer system for designing a modified-protein, said system comprising: 

i. a database containing a set of protein sequence data; 

ii. a software program coupled with said database, the software program adapted 
for performing the steps of : 

(a) generating randomly conformational angles in allowed region of 
Ramachandran maps from the set of protein sequence data; 

(b) generating a protein backbone using said confonnational angles; 

(c) adding a rotamer to residues in the backbone; 

(d) determining disulfide bonds in the protein backbone; 

(e) calculating linear conformers; 

(f) calculating solvent accessible surface based energy of conformers that 
are generated in steps (a) - (d); and 

(g) creating the modified protein using structural characteristics identified 
in the above steps. 

57. The computer system of claim 56, wherein the calculating step includes linking to an 
external program for calculating conformers. 
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A flow chart of the program MPMOD 
(the fast mod) 

Input: peptide sequence, disulfide bond connectivity and other 
parameters 



201 



Randomly generate <I>, co in the allowed regions of the Ramachandran 
maps (Gly, Pro, CB-branched, and the rest of the amino acid). Assign the 
angles to each residue of the backbone. 
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Generate backbone atoms N, CA, C starting from three given atoms. 
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Check the CA-CA and CB-CB distance pairs. CB were added to the 
CYS and deleted after using it. 



Bad 
► 
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Do vdw check fpr the backbone atoms (N t CA, C) 



Bad 



205 



Generate the rest of backbone atoms CB f O (HN 
and HCA optional). Do vdw check with all other 
atoms after each atom is added. 



Bad 
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Modeling the SS bond and record the SS coordinate 
pairs for later (the final) vdw check. 



Bad 



Continues to next page 
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Continues from previous page 
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Add rotamers (four options) to each residue except 
CYS. Do vdw check after each rotamer (a group of 
atoms) is added. 
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No 
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Obtain all the none- 
vdw-violation 
rotamer for each 
residue. Obtain all 
the combinations of 
the rotamer, and also 
do rotamer pair vdw 
check. 
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Write the backbone 
PDB file in one file, 
and write all 
the rotamer 
combinations into 
another file 
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Three options to add rotamers. 
Do vdW check for all of them 



Do vdw check for all of the SS 
pairs with the complete 
conformer. Pick up the best SS 
bond. 



No^ 



Bad 



Write the PDB file , 
backbone angles, 
all the input and 
statistic information file 



Do "binding" test for each 
conformer with the receptor. 
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Calculate the SAS-based 
energy for each conformer. 
AG; Probability e <- AG/R1 > 
Partition Ee<- AG/RT > 



Note: 

Three options for atom pair Van der Waais contact distances check. 

Option 1, normal distance; 2, extreme distance; 3, calibrated distance. 

The backbone atoms are separated from side chain atoms, when doing vdw check. 
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A flow chart of the program MPMOD 
(the slow mod) 
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Input: peptide sequence, disulfide bond connectivity and other 
parameters 
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Randomly generate <*>, *F t co in the allowed regions of the 
Ramachandran maps (Gly, Pro, CB-branched, and the rest of 
the amino acid). Assign the angles to each residue of the 
backbone. 
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Generate backbone atoms N, CA, C starting from 
three .given atoms. 
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Generate backbone atoms CB, O (HN and HCA 
optional). Do vdw check with all other atoms 
after each atom is added. 
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Add rotamers (three options) to each residue 
including CYS. Do vdw check after each rotamer 
(a group of atoms) is added. (See separate page); 



Continues to next page 
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1, Check the CA-CA and 
CB-CB distance pairs. 

2, Modeling disulfide 
bonds record the SS pairs. 

3, Do vdw check for the SS 
pairs with the complete 
conformer 



Yes 



308 



No 



Record the number of 
conformer SS bond. 
Calculate the S AS-based 
energy AG for each conformer. 
Probability e<- AC/R ^ 
Partition Se<' AG/irn 



Record the number of 
conformer that can not 
form SS bond. 

Link Corex program and 
calculate the SAS-based 
energy AG for each 
conformer* 
Probability e (AG/RT > 
Partition 2e<- AG/R1 ^ 
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Write the PDB file of each 
conformer (xyz-ss.pdb). 
Do all the statistics (mod) 
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The loop generation of program MPMOD 

400 j Input the two residue numbers of the flexible loop of ihe protein and the accuracy. 
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Randomly generate *, Y, to in the allowed regions of the Ramachandran maps 
(Gly. Pro, CB-brancbed, and the rest amino acid). 



Bad 



402 r Generate mainchain atoms N, CA, C, O, HN, HCA 



403 r Do Ca-Ca distance check for the N & C terminals 



404 I Minimize the N and C terminal of the conformer to be the same as the cutting part of the 
I protein, by tweaking the diheral angles. 
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Check the handness of the conformer (It should be the same as the cutting parts of the target 

protein 
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Do mainchain atom pairs vdW check'. 
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Align the conformer to the target 
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Do mainchain and protein atom pairs vdW check. 
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Add sidechain do vdW check for all of atom pairs. 



410 



Write the closed pdb file 
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Modeling disulfide bond 




Generated SG on the circle formed by the 
rotation along Ca-CP bond. The range of the 
50? rotation angle xl is optional with a step at 4° 
Option 1 ,0-360; 

2, |60|±40°and 180 ±40° 

3, |60|±30 o and 180 ±30° 
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The "binding" test 
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Get the following things: 

1, pdb coordinates of the generated conformer and the 
crystal structure. 2 % segment of sequence for both 
alignment. 3, criteria for best alignment. 4, three 
options for test "binding". 
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Align the conformer to t 
crystal complex. 


he corresponding peptide 
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Calculate the S AS-based energy for each 
conformer. AG; Probability e ( AGmT) ; Partition 
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